Start using Proxmox

Set up basic services on Proxmox and its VM/CT like personal Add-ons, NTP Chrony, E-mail alerts, Firewall settings, Networking, Users and Pools, Templates Docker, Backup Strategy and DOCUMENTATION.

Start using Proxmox
Photo by Maxime Horlaville / Unsplash

These are my routines setting up an Proxmox node. I have my script for it but this is what I do. Some are for the looks and comfort but some are essential for running a cluster and for security.

Setup Bash and Add-Ons

At start your bashrc is populated put all is commented out, that's fine for now. We will add one line and two files to change this.

If the file .bash_aliases exist start using it

Add this line to bashrc if its missing. It's missing out of the box.

echo "[[ -f ~/.bash_aliases ]] && . ~/.bash_aliases" >> .bashrc

Or download my preferences

wget https://raw.githubusercontent.com/nallej/MyJourney/main/BashAddon.sh 

chmod +x BashAddon.sh 

./BashAddon.sh

If you need IOMMU

See my post on IOMMU and Dark mode link.

Your MB and it's Firmvare as do your CPU needs to support it.

Get bat

Bat is installed with BashAddon.sh otherwise just apt install bat and add a alias bat='batcat' to your bash (Debian uses a strange command batcat).


Setup NTP

NTP is one critical thing with databases, backups and cluster setups. Especially important to have the correct time is on High Availability setups, you just need to have the correct time all over the cluster. The NTP daemon listens on the 123 port. For security reasons use only one with access to the internet.

Some networks do not allow use of external NTP server. In that case you have to to setup your own. The time on these networks need to be controlled and set manually from time to time.

By default all Proxmox nodes do their own thing as stand alone servers and goes constantly out for a NTP sync. They use the pool 2.debian.pool.ntp.org as default and with the iburst setting.

I prefer to have one to act as the local NTP server and the rest as configured as clients. Proxmox uses Chrony as the NTP client, it can also be used as an NTP server.

Installing a separate NTP server(s)

# If you plan to help the NTP pool Community by shareing it
sudo apt install ntp

# This is what Proxmox uses
sudo apt install chrony 

Check your NTP status by these commands

chronyc activity

chronyc tracking

chronyc sources

chronyc sources -v # Verbose

By checking performances you will find your best set for accuracy and stability. My ISP's are the best so I stick to them. They are loaded by default by the /etc/chrony/chrony.conf, please read it before adding anything.

Add your own settings and best NTP servers

nano /etc/chrony/conf.d/chrony.conf
# As example only 

server time.cloudflare.com
pool   pool.ntp.org iburst maxsources 3

The iburst is for your best one. Use Stratum 1 or 2 servers for best results.

Setting up a local NTP server

Using the node facing the interweb as a local source of NTP.

  1. Open the port UDP 123
  2. Deal with the DHCP for NTP
  3. Configure the Chrony
  4. Allow it as a local server
  5. Configure the other nodes to us it

Setup the NTP server on Pve-1 at 192.168.1.123

# Minimum changes

nano /etc/chrony/chrony.conf 

# Allow NTP client access from local network.
allow 192.168.1.0/24

# Restart the service
systemctl restart chronyd

Setup the other Proxmox nodes as clients

# Minimum changes
# Comment out any severs or pools

# Add this line

nano /etc/chrony/chrony.conf 

server 192.168.1.123 iburst

# Restart the service

systemctl restart chronyd
systemctl status chrony
chronyc sources

Setup Email Alerts

This is a strangely complicated process in Proxmox -why?
This procedure is following the Proxmox guides and articles on the Forum.

I will quickly go thru the steps.

  • First setup a gmail account for sending Alerts, get a Application Password for it
    DO NOT USE the account password - it will not work that way, I did test
    You do not want them in your daily mail, tested that too
  • Setup postfix to send them
  • On Proxmox 8 I use Gotify

Add some utilities for postfix

apt update
apt install libsasl2-modules mailutils postfix-pcre -y
cp /etc/postfix/main.cf /etc/postfix/main.cf.bak

Create the password

nano /etc/postfix/sasl_passwd
# Add the password info
smtp.gmail.com [email protected]:YourApplicationPassword

postmap hash:/etc/postfix/sasl_passwd
# Is it sucessfully created
cat /etc/postfix/sasl_passwd.db

Edit the postfix configuration

nano /etc/postfix/main.cf

# Add this for your mail
# Comment out the old relayhost =

# gmail configuration
relayhost = smtp.gmail.com:587
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_tls_security_level = encrypt
smtp_sasl_security_options = noanonymous
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_tls_CAfile = /etc/ssl/certs/Entrust_Root_Certification_Authority.pem
smtp_tls_session_cache_database = btree:/var/lib/postfix/smtp_tls_session_cache
smtp_tls_session_cache_timeout = 3600s
smtp_header_checks = pcre:/etc/postfix/smtp_header_checks

Add the smtp header

nano /etc/postfix/smtp_header_checks

/^From:.*/ REPLACE From: pve-1--Alert [email protected]

postmap hash:/etc/postfix/smtp_header_checks

postfix reload

Test message

echo "This is testing Alert-messages from pve-1" | mail -s "Test email from pve-1" [email protected]
💡
Email alerts are not a maybe - they are a MUST. 🫵

Configure your Firewall

Do NOT set it on before editing - you may be locked out! 👈

Before you switch on the Firewall from the Datacenter it will not work at the VM level. You can set rules on Datacenter, Node and VM level. This means you need to plan in detail what to use and why.

Remember it's extremely dangerous out there 😈

  1. Proxmox ports needed for operation, open at setup please
  2. Other ports needed on VM's - only open if needed
  3. Plan on what node a VM can run - open or close the ports accordingly
  4. Use the Macros
  5. Remember to Enable if you want the rule to work

Cluster Wide Setup

Cluster-wide firewall configuration is stored at: /etc/pve/firewall/cluster.fw

Host Specific Configuration

Host related configuration is read from: /etc/pve/nodes/<nodename>/host.fw

VM/Container Configuration

VM firewall configuration is read from:` /etc/pve/firewall/<VMID>.fw`

Firewall Rules

Firewall rules consists of a direction (IN or OUT) and an action (ACCEPT, DENY, REJECT). You can also specify a macro name. Macros contain predefined sets of rules and options. Rules can be disabled by prefixing them with |.

Security Groups

A security group is a collection of rules, defined at cluster level, which can be used in all VMs' rules. For example you can define a group named “webserver” with rules to open the http and https ports.

IP Aliases

IP Aliases allow you to associate IP addresses of networks with a name. You can then refer to those names:

  • inside IP set definitions
  • in source and dest properties of firewall rules

Default firewall rules

The following traffic is filtered by the default firewall configuration:

Datacenter incoming/outgoing DROP/REJECT

If the input or output policy for the firewall is set to DROP or REJECT,

The following traffic is still allowed for all Proxmox VE hosts in the cluster:

  • traffic over the loopback interface
  • already established connections
  • traffic using the IGMP protocol
  • TCP traffic from management hosts to port 8006 in order to allow access to the web interface
  • TCP traffic from management hosts to the port range 5900 to 5999 allowing traffic for the VNC web console
  • TCP traffic from management hosts to port 3128 for connections to the SPICE proxy
  • TCP traffic from management hosts to port 22 to allow ssh access
  • UDP traffic in the cluster network to ports 5405-5412 for corosync
  • UDP multicast traffic in the cluster network
  • ICMP traffic type 3 (Destination Unreachable), 4 (congestion control) or 11 (Time Exceeded)

The following traffic is dropped, but not logged even with logging enabled:

  • TCP connections with invalid connection state
  • Broadcast, multicast and anycast traffic not related to corosync, i.e., not coming through ports 5405-5412
  • TCP traffic to port 43
  • UDP traffic to ports 135 and 445
  • UDP traffic to the port range 137 to 139
  • UDP traffic form source port 137 to port range 1024 to 65535
  • UDP traffic to port 1900
  • TCP traffic to port 135, 139 and 445
  • UDP traffic originating from source port 53

The rest of the traffic is dropped or rejected, respectively, and also logged. This may vary depending on the additional options enabled in FirewallOptions, such as NDP, SMURFS and TCP flag filtering.

Ports used by Proxmox VE

  • VNC Web console: 5900-5999 (TCP, WebSocket)
  • SPICE proxy: 3128 (TCP)
  • sshd (used for cluster actions): 22 (TCP)
  • rpcbind: 111 (UDP)
  • sendmail: 25 (TCP, outgoing)
  • corosync cluster traffic: 5405-5412 UDP
  • live migration (VM memory and local-disk data): 60000-60050 (TCP)

Proxmox Management Web Interface

  • Proxmox VE Server 8006 (TCP, HTTP/1.1 over TLS)
  • Proxmox Backup Server 8007 (TCP, HTTP/1.1 over TLS)

Other port you might need open - Use the Macros if possible

  • HTTP: 80 (UTP, TCP)
  • HTTPS: 443 (UTP, TCP)
  • FTP: transfer 20, control 21 (TCP, SCTP)
  • FTPS: data 989, control 990 (TCP, UDP)
  • Secure Shell: SSH 22(TCP) scp, sftp
  • DHCP: 67, 68 (UDP) and failover protocol 647, 847 (TCP)
  • DHCPv6: client (TCP/UDP) 546 server 547 (TCP/UDP)
  • Node-RED: 1880 (UDP, TCP)
  • MQTT: 1883 (UDP, TCP)
  • RADIUS: radius 1812, radius-act 1813, radsec 2083 (UDP, TCP),
    change of authorization 3799 (UDP)
  • CUPS: admin, IPP 631(TCP/UDP)
💡
Only open the ports you REALLY need open 👈

Network
Photo by Jordan Harrison / Unsplash

Setup your Networks

An ideal cluster have a minimum of 3 (5 or more is better) identical servers.

In a homelab we tend to have a mishmash of hardware - no big problem.
On a cluster you need to have the same VMBR's on the Nodes for the VM's to be able to migrate and run. If a VMBR do not exist on a Node the VM can't run.
You group VM's by their HW needs so they can migrate within that grope.

Special Networks

There is some networks that need to be solely used for one purpose so the subsystem get access undisturbed of other apps or subsystems and there is networks that are dependent on high speed.

In a homelab environment we need to do compromises and save NIC's. We usually only have 2 or 4 NIC's. It will work. Adding a 10 G NIC and tune the network is better.

  1. Management: low speed access is ok, main GW for the node
  2. Cluster: high speed and is better with a dedicated NIC
  3. Migration: high speed and is better with a dedicated NIC
    With Replication activated, the need for speed is not that high
  4. Networked FS: requres high speed and a dedicated NIC's
  5. Bonded NIC's are good for VM Bridge and VLAN's

Special File Systems for VM/CT disks

Remember to setup NTP on the servers used in these roles.
Clusters actually need the high speed of 10 G networks.

  • Shared storage, needs high speed and a dedicated network. 1 G can only support a few VM's before it's getting totally saturated. Use iSCSI or NFS, do not use SMB it'ss not recommended.
  • Storage Replication - It replicates VM/CT disks to another node(s) so that all data is available without using shared storage. Replication uses snapshots to minimize traffic sent over the network. Each replications bandwidth can be limited, to avoid overloading a storage or server.
    Only changes since the last replication (deltas) need to be transferred if the guest is migrated to a node to which it already is replicated. This reduces the time needed significantly. The replication direction automatically switches if you migrate a guest to the replication target node. Storage type is ZFS.
  • GlusterFS - Gluster can run atop LVM or ZFS to enable features such as snapshots. This design is highly redundant, and our virtual machines are highly available. Gluster sits on top of an existing file system, it is file-oriented, so we can use it with ZFS. Gluster can be installed on VM's or bare metal.
    When we have a Hodge-podge setup in our labs,
  • Ceph - Ceph is an object-oriented file system, and it also acts as your LVM or Logical Volume Manager. This makes it largely incompatible with ZFS.
    Ceph lacks performance on smaller clusters, in terms of throughput and IOPS, when compared to GlusterFS (see the pdf in the link below).
    Ceph is usually used at very large AI clusters and even for LHC data collection. When we have a Hodge-podge setup in oure labs, Ceph clusters will be limited by the slowest and smallest ZPOOL's.

See the paper by Giacinto Donvito, Giovanni Marzulli, Domenico Diacono pdf link: Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis.

Table 2. Test results using dd. MB/s see this(Cepf vs Gluster vs Hadoop)

MB/s Ceph CF Gluster 3.3 HDFS
read 126.91 427.30 220.05
write 64.71 268.57 275.27

All disks are not equal either

High availability without 10 G Networks

If you don't have 10 G or faster implemented, you need to consider how to do it.

You can work around this by using Replication on 1 G networks, the changes to a VM's disk are not usually that big. Replication keeps the VM's disk updated on all nodes assigned to the group for quick redeployment of the VM on an other node.

Disk speed is important. Use 6 or 12 G/s SAS drives rather than 3 G/s SATA SSD's or HD's. SAS drives are also faster because they are full duplex devices.

Planning and documentation is key to success

Setup your Groups, Users and Pools

You can create many users and groups to minimize the threat vector and/or make things easier to operate. On corporate clusters there is many layers upon layers.

Each user can be a member of several groups. Groups are the preferred way to organize access permissions. You should always grant permissions to groups instead of individual users. That way you will get a much more maintainable access control list.

A resource pool is a set of virtual machines, containers, and storage devices. It is useful for permission handling in cases where certain users should have controlled access to a specific set of resources, as it allows for a single permission to be applied to a set of elements, rather than having to manage this on a per-resource basis.

Resource pools are often used in tandem with groups, so that the members of a group have permissions on a set of machines and storage.

Start creating your groups and then create the users
💡
2FA (Two-Factor Authentication) is highly recommended

Create basic Templates

Create a set of Templates to spin up VM's fast. I have some scripts for it.

See my blog Proxmox Automation about it and also the Update - Proxmox Automation.

wget https://raw.githubusercontent.com/nallej/MyJourney/main/myTemplateBuilder.sh

Create a basic Docker VM

Make a basic Docker VM where you can test tings before you set them live.

See my blogs Docker and Dockge, Docker and Proxmox and the old The Docker stack - the journey starts

wget htps://raw.githubusercontent.com/nallej/MyJourney/main/myVMsetup.sh

HDD harddisk drive storage backup tool hardware
Photo by Markus Spiske / Unsplash

Create a Backup Strategy and Tactics

This is perhaps the most important thing to do. Take time and plan well. Remember the 3-2-1 rule.

  • Keep at least three (3) copies of data.
  • Store two (2) backup copies on different storage media.
  • Store one (1) backup copy offsite.

Remember to use logs and automatic email services.

Physical threats

In the corporate world we start with total destruction of the city, building and all the way down to the individual server components. Earthquakes and wildfires, terrorist attacks and accidents do happen.

Consumer grade is short lived and have low MTBF. Enterprise grade has much higher MTBF and hot standby units and redundant sub systems.

What will break in a server? Answer: everything.

  • Servers have dual power supply's, hot standby memory and hot standby disks - dos that give us a hint. Also all fans are easy to replace without any tools.
  • RAID controllers used to be really bad and their batteries or super caps are still bad - may lead to loss of data. And there is a large number 0f factory refurbished cards on the market, an other hint?
  • RAID systems are more or less a flavor of the past. Many problems like 'bit rot' and trusting disks reporting of OK state. ZFS is way better in storing data.
  • SSD's have a limited usage and can stop working suddenly.
  • SATA disks are found in large quantities as factory refurbished, guess why.
  • SAS disks are usually the best choice. They are long lived, but not for ever.

People

Create a plan for missing human resources: totally absent, short handed situations and holidays. Prepare for the worst. anyone can end up in hospital for a long time.

This is true for us homelabbers to.

Resources

Also make a plan for lack of utilities, total, out for a day or whatever is feasible.

Some years ago we had power outages weekly when it snowed, today maybe one time per year or less. The lesson here is that things change - so should our strategy.

I need to start my servers with delays otherwise the fuse will blow due to inrush current.

  • Do your AC cope during a extended heatwave?
  • Is the area warm in the winter?
  • Do your Backup server have many storage pools?

Locations

It's common practice to have backups at multiple locations. Large corporations may request to have them in different counties or even on different continents.

We could have one set in the lab, one at a friends house and one on cloud storage.

Disaster recovery

Make a rock solid plan for recovery, it's better to have a plan to follow than try to do it blindfolded. A plan ensures you do things and in right order.

A good way to make recovery is to make special scripts that are stored on media in the lab and the files are backup on GitHub, GitLab and/or some cloud storage sites. Same apply to all installation medias.

Your main tool is bash and you need to run commands in the CLI

-

Documentation, documentation, documentation...

All IT relays heavily on documentation. Without documentation you are lost.

See my post about NetBox Link 👈

Create a good documentation and print it out,
one by the servers one on your desk.
💡
Email alerts are not a maybe - they are a MUST. 🫵

References

Bat: A cat clone with wings or a cat on steroids [1]
What is and how does NTP work. How to install your own. [2]
NTP more info and it's history. [3]
Chrony is a versatile implementation of the Network Time Protocol (NTP). [4]
Gluster is a scalable network filesystem for large, distributed storage solutions for data- and bandwidth-intensive tasks, using common hardware. [5]
List of TCP and UDP port numbers. [6]
Netbox home page [7]
Proxmox documents:
Networking, virtual switches and networks [8] Firewall wiki. [9] Admin Guide. [10] Opt-in Linux 6.1 Kernel for Proxmox VE 7.x available. For some functions you may need the new kernel. [11]
The 3-2-1 strategy [12]


  1. Bat utility info is found at github ↩︎

  2. NTP info is found on NTP.org's home page ↩︎

  3. Network Time Protocol, read this wiki ↩︎

  4. Chrony Introduction. See the home page ↩︎

  5. The Gluster project home page and the documentation is found at this page ↩︎

  6. Abouth port numbers, see this wiki ↩︎

  7. Netbox documentation is found on the home page ↩︎

  8. How to setup networking in Proxmox wiki ↩︎

  9. Proxmox firewall wiki ↩︎

  10. The Proxmox admin guide ↩︎

  11. Opt-in Linux 6.1 Kernel. Read the Forums ↩︎

  12. Read this page ↩︎