SSH Host Key Certificates - How to bypass SSH known_hosts bug(s)
Proxmox Issue SSH Host Key Certificates - How to bypass SSH known_hosts bug(s) As of PVE 8.1, there's still a bug where running `pvecm updatecerts` deletes all but the oldest (instead of newest) SSH keys from the shared cluster-wide known_hosts
Proxmox Issues Tutorial by tempacc346235
As of PVE 8.1, there's still a bug where running pvecm updatecerts
deletes all but the oldest (instead of newest) SSH keys from the shared cluster-wide known_hosts
file which then causes issues manifesting themselves through WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!
and Offending RSA key in /etc/ssh/ssh_known_hosts:$lineno
and remove with: ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "$alias"
, which then breaks the symlink into pmxcfs
and makes one dig even deeper into the troubleshooting rabbit hole.
This is a simple-streamlined process to either prevent this issue or get out of it without having to do potentially risky Perl file patching or deleting keys one might have wished to otherwise retain. It bypasses the known_hosts
corruption issue by using SSH certificates for the purpose of remote host authentication, it does NOT change the behavior in relation to the user authorization (related to the authorized_keys
file).
Assuming the cluster is otherwise healthy and has quorum and no connectivity issues, except for disrupted SSH connections, e.g. proxying console/shell, secure local-storage migration and replication, but also Q-Device setup. See also [1].
There's an existing Certification Authority (CA) used in PVE - see also [2], currently only for SSL connections, but as SSH certificates are nothing more than CA-signed SSH keys with associated IDs (principals), it is easiest to reuse the said CA (see note (i)):
In any single node's root shell perform once (the location is shared for all nodes in the cluster):
Code:
# openssl x509 -in /etc/pve/pve-root-ca.pem -inform pem -pubkey -noout | ssh-keygen -f /dev/stdin -i -m PKCS8 > /etc/pve/pve-root-ca.pub
# echo "@cert-authority * `cat /etc/pve/pve-root-ca.pub`" >> /etc/ssh/ssh_known_hosts
This converts the CA certificate to a format needed for SSH and adds any current or future SSH key signed by the CA as recognized by any node of the cluster as valid, even in case of other conflicting entries present.
On each individual node (you may want to automate this in case of large cluster), the respective host key then needs to be signed and set for the node:
Code:
# ssh-keygen -I `hostname` -s /etc/pve/priv/pve-root-ca.key -h -n `(hostname -s; hostname -f; hostname -I | xargs -n1) | paste -sd,` /etc/ssh/ssh_host_ed25519_key.pub
# echo "HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub" >> /etc/ssh/sshd_config.d/PVEHostCertificate.conf
This makes use of Ed25519 keys, it did however use the RSA (albeit 4096bit) CA's key to sign them. If you have any specific reason, you may of course opt for any other SSH keys in /etc/ssh/
to be used here, not necessarily Ed25519. See also note (ii).
Note: The sshd
service needs to be restarted for the changes to take effect.
And that's it! From now on, your nodes will be always able to SSH connect to each other. The only annoyance being, all future nodes need to have the two liner executed once. Again, this would be best automated as it does not interfere with the rest of PVE's internals. There are no caveats to this, however, if you do not sign your future nodes' keys and PVE manages to find the individual recognized key on record, it will work still. But if you encounter the bug in pvecm updatecerts
it will not disrupt connections to those nodes which had signed host keys as the buggy tool safely ignores @cert-authority
entries in the known_hosts
file.
One final note on how PVE makes use of HostKeyAlias
option for SSH connections. This option is always used for e.g. migrations/replications and will make use of specific ID from the known_hosts
file irrespective of the hostname or IP address of the node being connected to. If your IDs (principals) listed in the signed keys (see note (ii)) include this alias, it will keep working as expected, i.e. it will even work if this is your nth time introducing a cluster node by the same name (as some dead nodes used to have) as long as its host key is signed. The leftover keys on record are safely ignored then, as they should have been to begin with.
If you end up with multiple records present with the same name that is also the ID listed in the key signed by the CA, the signed key will take precedence as can be checked:
# ssh -vvv -o HostKeyAlias=$alias $ipaddress
...
debug1: Found CA key in /etc/ssh/ssh_known_hosts:$lineno
debug3: check_host_key: certificate host key in use; disabling UpdateHostkeys
If you however failed to list the ID under which your node is recognised by PVE, you will have a failure (only in case it would have failed anyways due to the bug):
#ssh -vvv -o HostKeyAlias=$alias $ipaddress
...
debug1: Host '$alias' is known and matches the ED25519-CERT host certificate.
debug1: Found CA key in /etc/ssh/ssh_known_hosts:$lineno
Certificate invalid: name is not a listed principal
debug1: No matching CA found. Retry with plain key
TESTED ON: pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-6-pve)
Related bug reports: #4252, #4886
References
Proxmox Documentation Admin Guide [1] Roole of Serts [2]
Notes [3]
The Admin Guide ↩︎
The Role of Certs ↩︎
Notes
(i) If you wanted to know how much validity there's left for the CA, feel free to check with openssl x509 -in /etc/pve/pve-root-ca.pem -text -noout, it is 10 years as nominally generated by PVE and therefore rotation is not in scope of this tutorial either.
(ii) If you wish to double-check that all the correct IDs (principals) were included in the signed key, you can do so with ssh-keygen -L -f /etc/ssh/ssh_host_ed25519_key-cert.pub. There should be the hostname, FQDN as well as all IP addresses listed. You can, of course, change this list by editing the list within the -n option of ssh-keygen. Please also note there's absolutely no expiry defined for these keys, which mimics the default behavior of PVE regarding SSH key handling.
Link
Proxmox Forums → Proxmox Virtual Environment → Proxmox VE: Installation and configuration posted on this link by tempacc346235, Dec 8 2023 ↩︎