Deploying the manager before the fleet: Proxmox Datacenter Manager, ahead of the hardware

June 10, 2026

proxmoxhomelabinfrastructureiacpdm

The conventional advice on Proxmox Datacenter Manager (PDM) is: wait until you have enough clusters to justify a single-pane manager. I deployed it the other way around — before the hardware that justifies it — and the reasoning turned out to be the interesting part.

Why before, not after

Two things flipped the usual order:

  1. There are already real targets. This lab runs the production 2-node cluster plus two nested test clusters (a 3-node and a 5-node) for upgrade rehearsals. That’s ten nodes across three clusters (plus a backup server) to juggle in browser tabs today.
  2. PDM can drive the next host’s install. PDM 1.1 added automated installs: you build an answer file in PDM, it serves it over HTTPS to a booting PVE ISO, and the new node installs unattended. The next box (a 24-bay Ceph host) is coming — and I’d rather the thing that onboards it already exist and be tested than stand it up in a hurry on arrival day.

So PDM isn’t future-useful here. It’s useful now, and it pre-positions the tooling for the hardware that’s still in a box.

The build

Standard IaC: a Terraform module clones the existing Debian 13 “Trixie” cloud-init template into a small VM (2 vCPU / 4 GB / 40 GB), then Ansible installs PDM from the pdm-no-subscription apt repo. PDM 1.1 is a Trixie appliance under the hood, so the base template is the exact OS it wants — no special image needed. Web UI on :8443, internal only, no public exposure.

Clean in principle. The gotchas were where the time went.

Four gotchas worth knowing

1. grub-pc will fail its post-install on a cloud image. The PDM meta-package pulls in a Proxmox kernel, which pulls grub-pc, whose post-install script aborts because the cloud image never set a GRUB install device. The fix is to point GRUB at the real boot disk (grub-pc/install_devices) and re-run dpkg --configure -anot the install_devices_empty=true shortcut you’ll see suggested, because here a new kernel genuinely needs GRUB managed so it boots.

2. The enterprise repo comes back. Installing PDM re-creates a pdm-enterprise sources file that 401s without a subscription and breaks apt update. You have to remove it as the first step of every run, not just after install.

3. The UI is WebAssembly now. PDM 1.1’s interface is compiled Yew/WASM, not the old ExtJS JavaScript. That’s mostly an upgrade — but it means the classic “patch the JS to remove the no-subscription nag” trick is dead. The string lives in a .wasm binary. There’s no clean way to hide that panel; it’s cosmetic, and that’s that.

4. The CLI and the API don’t authenticate the same way. The pdm-client CLI fought me on password auth, so I drove the REST API directly — and it has its own conventions. The login ticket is an httpOnly cookie (__Host-PDMAuthCookie), so a cookie jar is mandatory; the answer-file/remote endpoints want node lists as property-strings (hostname=IP,fingerprint=FP), not JSON objects; and API tokens use a colon before the secret (PDMAPIToken=user@realm!token:secret) — PBS-style, not PVE’s =. Small things, but each one is a 401 until you get it right.

Read-only is a feature, not a downgrade

Adding a remote to PDM means giving it credentials to that cluster. The instinct is to hand it root. I split it deliberately:

  • PDM itself gets a full Administrator token per cluster — the whole point of a manager is to start, stop, and migrate.
  • A read-only Auditor token went to something else entirely: the lab’s local AI agent.

That second one is the part I keep coming back to. The agent (a self-hosted model that answers questions about the lab) now holds a read-only PDM token as an environment variable. Ask it “what’s running on node 2 right now?” and it queries PDM live — across every cluster, through one API — and physically cannot change anything. Write calls come back denied. It’s the cleanest possible “give the robot eyes, not hands” boundary, and PDM’s role model made it a two-token decision rather than a custom integration.

One wrinkle worth flagging: PDM tokens are privilege-separated, so the effective permission is the intersection of the user’s ACL and the token’s ACL. Grant the role to the user only, and the token sees nothing despite authenticating fine. Grant it to both.

The honest limitation

PDM’s top-line storage gauge is useless in this topology, and it’s worth saying why. A single large NAS is mounted across every cluster — as an NFS share on the test clusters, as a PBS datastore on production. PDM sums every storage in each cluster’s rollup with no de-duplication of shared backing and no filter by content type, so all three clusters report the same ~90 TB and the number means nothing.

It’s tempting to call that a bug. It isn’t — each cluster genuinely mounts that array, so each one’s own summary would show the same thing. PDM is faithfully reporting reality; the reality is just dominated by one shared backup target. The per-storage views are fine; the aggregate isn’t, and no setting changes that. A manager can only be as legible as the estate underneath it.

Where this leaves the next host

The manager is up, all four remotes (three clusters plus the backup server) are in it, and the automated-install path is the next thing to exercise — a throwaway nested node that boots a PVE ISO and lets PDM serve the answer file, so the real 24-bay host is a known quantity when it lands. Deploy the manager before the fleet, and onboarding the fleet stops being a surprise.

← all posts

Comments

No comments yet — be the first.

Leave a comment

Moderated before it appears.
Theme
Font