MyNewLab

My New Lab

As we have moved, I need to downsize my lab drastically. I will create a series of posts how I create my new lab. Not only will I lower the number of servers but also use smaller machines, NUCs- and SFF-machines. As I need to do a lot of travelling, this will take a long time to get ready.

Photo by Ross Findon / Unsplash

💡

Actually, a lab will newer be ready, it's in the nature of a lab.

As it's a lab, all machines will be taken down and be rebuilt many times. But some of the systems will always remain to do the local production.
Production servers are in remote datacenters.

Basic building blocks

Virtualization: Proxmox Virtual Environments PVE and KVM on my PC
- I'm testing alternatives to Proxmox: TrueNAS and other Incus servers
Storage FS will be ZFS with compression and deduplication,
- XFS for some boot drives and VMs
Backups: Proxmox Backup Servers PBS for back-ups of PVE and PCs
- Root disk usage 12 G, CPU 15%, except Friday evening when it spikes doing housekeeping task on a grand scale, Verification, Prune
- My current deduplication savings factor is huge, 85.79
Docker-CE + Dockge/Portainer-CE for most Apps, Docker Desktop
Nginx Reverse Proxy and Træfic as my Reverse Proxy
pfSense or OPNsense as my Router Firewall
Authentication: Authentik and/or Authelia
Ubuntu 22.04 LTS and 24.04 LTS for VM and CT
Debian 12 for VM and CT
Samba for NAS, AD DC, and DC
Grafana for showing stats
RustDesk for Remote desktops

Proxmox or not

There are a lot of interesting new systems I'm testing, like Incus (LXD/LXC) to name a few. The new TrueNAS is a good example of one of those. If you do not need to experiment with stuff, I think this is the one to run as the home production server.

Incus provides a user experience similar to that of a public cloud. With it, you can easily mix and match both containers and virtual machines, sharing the same underlying storage and network.

Cluster or Not

This is a big question, and the release of the experimental Proxmox Domain Manager PDM makes it harder. The possibility of moving VMs from node to node is a major thing. Link

The technology

Proxmox clustering utilizes pmxcfs, a database-driven file system, for storing configuration files, replicated in real-time on all nodes using corosync.

All nodes must be able to connect to each other via UDP ports 5405-5412 for corosync to work.
Date and time must be synchronized. Use a local NTP Server
An SSH tunnel on TCP port 22 between nodes is required.
A dedicated NIC for the cluster traffic, especially if you use shared storage.

Special care needed for setting up the nodes, name of storages, name of NIC's and CPU type need to match.

While it’s common to reference all node names and their IPs in /etc/hosts (or make their names resolvable through other means), this is not necessary for a cluster to work.

High-availability modes

Without entering, the never ending, debate about what HA is, let's make it short.

Factors to consider

Do you have a shared storage? A shared storage needs 10 G networking, and I try to down the electrical usage, so no. CEPH works best with 5 nodes, I do not plan for 5 nodes. Using any type of HA will increase wear and tear on disk is a thing to consider as it requires constant monitoring and a plan for replacing failing SSD/NVMe's

As I have a few production VMs that are using Replication, I will use a minimal cluster for them. Their configurations do not change often. The data of the VMs is constantly changing, and do not need to be replicated. Therefore, they will replicate infrequently, like weekly or monthly.

Firewalls and Database Servers as most other infrastructure devices like DNS, DHCP, NTP -servers and security appliances has inbuilt hot-standby that is better than using the slow PVE HA or Replication.

IOMMU

Yes, you will need IOMMU from time to time. See my old post for how to set up on GRUB or EFI. Start a terminal and check if you are running grub: efibootmgr -v

Output: EFI variables are not supported on this system → use Update GRUB
Output: EFI data → you have EFI, use the Update EFI path of the guide.

Networking

A separate cluster network is recommended, especially if you have shared storage.
It's not the speed that is important for this network but the latency.

Many of us use the lab to learn and experiment with networking. A lab is the perfect tool to test networking settings before implementing them in production.

Bond

All NIC will be bonded and set to use LACP (802.3ad) and QoS (port or 802.1P).

VLAN

VLAN are to be used for the Management, Cluster and Storage Networks and many others.

Proxmox Firewall

Yes, it is to be active on all devices and control access to and from devices. Link

Backup

For all backup needs of the Proxmox lab, I use a set of PBS Servers.

One of the main benefits is the deduplication feature.

Deduplication Factor 85.51

Fresh from my frontline PBS Server

Security

All servers have Fail2ban and uses the internal Firewall. User login utilizes SSH keys and access Tokens.

Fail2ban setup

Fail2ban shall be installed on all nodes, all server running on them and all stand-alone servers to make brute force attacks harder, this will protect against internal and external attacks.

Installation

apt update && apt install fail2ban -y

Install and make the jail.local configuration file, you may like to edit it.

Configuration files for Fail2ban

Generic PAM authentication errors

nano /etc/fail2ban/jail.d/pam-generic.conf

[pam-generic]
enabled = true
backend = systemd
banaction = iptables
findtime = 14d
bantime = 30d
maxretry = 1
ignoreip = 127.0.0.1/8 ::1

and the filter file
nano /etc/fail2ban/filter.d/pam-generic.conf

[INCLUDES]

before = common.conf

[Definition]

# if you want to catch only login errors from specific daemons, use something like
#_ttys_re=(?:ssh|pure-ftpd|ftp)
#
# Default: catch all failed logins
_ttys_re=\S*

__pam_re=\(?%(__pam_auth)s(?:\(\S+\))?\)?:?
_daemon = \S+

prefregex = ^%(__prefix_line)s%(__pam_re)s\s+authentication failure;(?:\s+(?:(?:logname|e?uid)=\S*)){0,3} tty=%(_ttys_re)s <F-CONTENT>.+</F-CONTENT>$

failregex = ^ruser=<F-ALT_USER>(?:\S*|.*?)</F-ALT_USER> rhost=<HOST>(?:\s+user=<F-USER>(?:\S*|.*?)</F-USER>)?\s*$

ignoreregex =

datepattern = {^LN-BEG}

Create the file /etc/fail2ban/filter.d/proxmox.conf with the following content:

[Definition]
failregex = pvedaemon\[.*authentication failure; rhost=<HOST> user=.* msg=.*
ignoreregex =
journalmatch = _SYSTEMD_UNIT=pvedaemon.service

Fail2ban filter file for OpenSSH

If you want to protect OpenSSH from being brute forced by password authentication, then get public key authentication working before disabling PasswordAuthentication in sshd_config.

nano /etc/fail2ban/filter.d/sshd.conf

[DEFAULT]

_daemon = sshd

# optional prefix (logged from several ssh versions) like "error: ", "error: PAM: " or "fatal: "
__pref = (?:(?:error|fatal): (?:PAM: )?)?
# optional suffix (logged from several ssh versions) like " [preauth]"
#__suff = (?: port \d+)?(?: \[preauth\])?\s*
__suff = (?: (?:port \d+|on \S+|\[preauth\])){0,3}\s*
__on_port_opt = (?: (?:port \d+|on \S+)){0,2}
# close by authenticating user:
__authng_user = (?: (?:invalid|authenticating) user <F-USER>\S+|.*?</F-USER>)?

# for all possible (also future) forms of "no matching (cipher|mac|MAC|compression method|key exchange method|host key type) found",
# see ssherr.c for all possible SSH_ERR_..._ALG_MATCH errors.
__alg_match = (?:(?:\w+ (?!found\b)){0,2}\w+)

# PAM authentication mechanism, can be overridden, e. g. `filter = sshd[__pam_auth='pam_ldap']`:
__pam_auth = pam_[a-z]+

[Definition]

prefregex = ^<F-MLFID>%(__prefix_line)s</F-MLFID>%(__pref)s<F-CONTENT>.+</F-CONTENT>$

cmnfailre = ^[aA]uthentication (?:failure|error|failed) for <F-USER>.*</F-USER> from <HOST>( via \S+)?%(__suff)s$
            ^User not known to the underlying authentication module for <F-USER>.*</F-USER> from <HOST>%(__suff)s$
            <cmnfailre-failed-pub-<publickey>>
            ^Failed <cmnfailed> for (?P<cond_inv>invalid user )?<F-USER>(?P<cond_user>\S+)|(?(cond_inv)(?:(?! from ).)*?|[^:]+)</F-USER> from <HOST>%(__on_port_opt)s(?: ssh\d*)?(?(cond_user): |(?:(?:(?! from ).)*)$)
            ^<F-USER>ROOT</F-USER> LOGIN REFUSED FROM <HOST>
            ^[iI](?:llegal|nvalid) user <F-USER>.*?</F-USER> from <HOST>%(__suff)s$
            ^User <F-USER>\S+|.*?</F-USER> from <HOST> not allowed because not listed in AllowUsers%(__suff)s$
            ^User <F-USER>\S+|.*?</F-USER> from <HOST> not allowed because listed in DenyUsers%(__suff)s$
            ^User <F-USER>\S+|.*?</F-USER> from <HOST> not allowed because not in any group%(__suff)s$
            ^refused connect from \S+ \(<HOST>\)
            ^Received <F-MLFFORGET>disconnect</F-MLFFORGET> from <HOST>%(__on_port_opt)s:\s*3: .*: Auth fail%(__suff)s$
            ^User <F-USER>\S+|.*?</F-USER> from <HOST> not allowed because a group is listed in DenyGroups%(__suff)s$
            ^User <F-USER>\S+|.*?</F-USER> from <HOST> not allowed because none of user's groups are listed in AllowGroups%(__suff)s$
            ^<F-NOFAIL>%(__pam_auth)s\(sshd:auth\):\s+authentication failure;</F-NOFAIL>(?:\s+(?:(?:logname|e?uid|tty)=\S*)){0,4}\s+ruser=<F-ALT_USER>\S*</F-ALT_USER>\s+rhost=<HOST>(?:\s+user=<F-USER>\S*</F-USER>)?%(__suff)s$
            ^maximum authentication attempts exceeded for <F-USER>.*</F-USER> from <HOST>%(__on_port_opt)s(?: ssh\d*)?%(__suff)s$
            ^User <F-USER>\S+|.*?</F-USER> not allowed because account is locked%(__suff)s
            ^<F-MLFFORGET>Disconnecting</F-MLFFORGET>(?: from)?(?: (?:invalid|authenticating)) user <F-USER>\S+</F-USER> <HOST>%(__on_port_opt)s:\s*Change of username or service not allowed:\s*.*\[preauth\]\s*$
            ^Disconnecting: Too many authentication failures(?: for <F-USER>\S+|.*?</F-USER>)?%(__suff)s$
            ^<F-NOFAIL>Received <F-MLFFORGET>disconnect</F-MLFFORGET></F-NOFAIL> from <HOST>%(__on_port_opt)s:\s*11:
            <mdre-<mode>-other>
            ^<F-MLFFORGET><F-MLFGAINED>Accepted \w+</F-MLFGAINED></F-MLFFORGET> for <F-USER>\S+</F-USER> from <HOST>(?:\s|$)

cmnfailed-any = \S+
cmnfailed-ignore = \b(?!publickey)\S+
cmnfailed-invalid = <cmnfailed-ignore>
cmnfailed-nofail = (?:<F-NOFAIL>publickey</F-NOFAIL>|\S+)
cmnfailed = <cmnfailed-<publickey>>

mdre-normal =
# used to differentiate "connection closed" with and without `[preauth]` (fail/nofail cases in ddos mode)
mdre-normal-other = ^<F-NOFAIL><F-MLFFORGET>(Connection (?:closed|reset)|Disconnected)</F-MLFFORGET></F-NOFAIL> (?:by|from)%(__authng_user)s <HOST>(?:%(__suff)s|\s*)$

mdre-ddos = ^Did not receive identification string from <HOST>
            ^kex_exchange_identification: (?:read: )?(?:[Cc]lient sent invalid protocol identifier|[Cc]onnection (?:closed by remote host|reset by peer))
            ^Bad protocol version identification '.*' from <HOST>
            ^<F-NOFAIL>SSH: Server;Ltype:</F-NOFAIL> (?:Authname|Version|Kex);Remote: <HOST>-\d+;[A-Z]\w+:
            ^Read from socket failed: Connection <F-MLFFORGET>reset</F-MLFFORGET> by peer
            ^banner exchange: Connection from <HOST><__on_port_opt>: invalid format
# same as mdre-normal-other, but as failure (without <F-NOFAIL> with [preauth] and with <F-NOFAIL> on no preauth phase as helper to identify address):
mdre-ddos-other = ^<F-MLFFORGET>(Connection (?:closed|reset)|Disconnected)</F-MLFFORGET> (?:by|from)%(__authng_user)s <HOST>%(__on_port_opt)s\s+\[preauth\]\s*$
                  ^<F-NOFAIL><F-MLFFORGET>(Connection (?:closed|reset)|Disconnected)</F-MLFFORGET></F-NOFAIL> (?:by|from)%(__authng_user)s <HOST>(?:%(__on_port_opt)s|\s*)$

mdre-extra = ^Received <F-MLFFORGET>disconnect</F-MLFFORGET> from <HOST>%(__on_port_opt)s:\s*14: No(?: supported)? authentication methods available
            ^Unable to negotiate with <HOST>%(__on_port_opt)s: no matching <__alg_match> found.
            ^Unable to negotiate a <__alg_match>
            ^no matching <__alg_match> found:
# part of mdre-ddos-other, but user name is supplied (invalid/authenticating) on [preauth] phase only:
mdre-extra-other = ^<F-MLFFORGET>Disconnected</F-MLFFORGET>(?: from)?(?: (?:invalid|authenticating)) user <F-USER>\S+|.*?</F-USER> <HOST>%(__on_port_opt)s \[preauth\]\s*$

mdre-aggressive = %(mdre-ddos)s
                  %(mdre-extra)s
# mdre-extra-other is fully included within mdre-ddos-other:
mdre-aggressive-other = %(mdre-ddos-other)s

# Parameter "publickey": nofail (default), invalid, any, ignore
publickey = nofail
# consider failed publickey for invalid users only:
cmnfailre-failed-pub-invalid = ^Failed publickey for invalid user <F-USER>(?P<cond_user>\S+)|(?:(?! from ).)*?</F-USER> from <HOST>%(__on_port_opt)s(?: ssh\d*)?(?(cond_user): |(?:(?:(?! from ).)*)$)
# consider failed publickey for valid users too (don't need RE, see cmnfailed):
cmnfailre-failed-pub-any =
# same as invalid, but consider failed publickey for valid users too, just as no failure (helper to get IP and user-name only, see cmnfailed):
cmnfailre-failed-pub-nofail = <cmnfailre-failed-pub-invalid>
# don't consider failed publickey as failures (don't need RE, see cmnfailed):
cmnfailre-failed-pub-ignore =

cfooterre = ^<F-NOFAIL>Connection from</F-NOFAIL> <HOST>

failregex = %(cmnfailre)s
            <mdre-<mode>>
            %(cfooterre)s

# Parameter "mode": normal (default), ddos, extra or aggressive (combines all)
# Usage example (for jail.local):
#   [sshd]
#   mode = extra
#   # or another jail (rewrite filter parameters of jail):
#   [sshd-aggressive]
#   filter = sshd[mode=aggressive]
#
mode = normal

#filter = sshd[mode=aggressive]

ignoreregex =

maxlines = 1

journalmatch = _SYSTEMD_UNIT=sshd.service + _COMM=sshd

Proxmox jail

nano /etc/fail2ban/jail.d/proxmox.conf

[proxmox]
enabled = true
filter = proxmox
backend = systemd
banaction = iptables
maxretry = 1
findtime = 14d
bantime = 30d
ignoreip = 127.0.0.1/8 ::1

and the filter file

nano /etc/fail2ban/filter.d/proxmox.conf

[Definition]
failregex = pvedaemon\[.*authentication failure; rhost=<HOST> user=.* msg=.*
ignoreregex =

Restart Service

To enable the new config, use: systemctl restart fail2ban to and arm fail2ban for protection of the Proxmox VE API.

Test for success

Make a failed attempt to login by PAM. Then issue the command

fail2ban-regex systemd-journal /etc/fail2ban/filter.d/proxmox.conf

You should now have at least a Failregex: 1 total at the top of the Results section and 1 matched at the bottom

Check for jailed attempts

Proxmox fail2ban-client status proxmox
SSH fail2ban-client status sshd

Router Firewall

pfSense

Use the amd64 image. The image is a bit old.

OPNsense

Use the amd64 image

DVD: ISO installer image with live system capabilities running in VGA mode. On amd64, UEFI boot is supported as well.
VGA: USB installer image with live system capabilities running in VGA mode as GPT boot. On amd64, UEFI boot is supported as well.
Serial: USB installer image with live system capabilities running in serial console (115200) including UEFI support...
Nano: a preinstalled serial image for USB sticks, SD or CF cards as MBR boot. These images are 3G in size and automatically adapt to the installed media size after first boot.

VPS

Yes, I will run stuff in the cloud. This way, the Lab is more of a lab.

References

Proxmox Virtual Environment PVE is a complete, open-source server management platform for enterprise virtualization. It tightly integrates the KVM hypervisor and Linux Containers (LXC), software-defined storage and networking functionality, on a single platform. With the integrated web-based user interface you can manage VMs and containers, high availability for clusters, or the integrated disaster recovery tools with ease.

Proxmox Backup Server PBS is an enterprise backup solution, for backing up and restoring VMs, containers, and physical hosts. By supporting incremental, fully deduplicated backups, Proxmox Backup Server significantly reduces network load and saves valuable storage space. With strong encryption and methods of ensuring data integrity, you can feel safe when backing up data, even to targets which are not fully trusted.

Proxmox Wiki pages wiki pages

Proxmox Documentation Documentation

Corosync homepage, Linux Kernel Archives pdf

Fail2ban homepage

pfSense homepage, Download

OPNsense Based on pfSense homepage, Download

Incus The new LXD implementation from Linux Containers. Incus is a next-generation system container, application container, and virtual machine manager. homepage