docs: add Hermes Agent LXC implementation plan + spec amendments
Plan: 10 tasks splitting workstation Terraform (token-safe container skeleton) from PVE-console host ops (features nesting/keyctl + bind mounts via pct set, which the API token cannot do) and in-container Docker/hermes bootstrap. Spec amended for the discovered API-token limitation: bind mounts AND container features require root@pam/SSH, so both are applied via console pct set rather than Terraform; terraform import tracked as follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -48,7 +48,7 @@ and generated files on the host's large disks via direct bind mounts.
|
||||
| Decision | Choice | Rationale |
|
||||
|---|---|---|
|
||||
| Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. |
|
||||
| Provisioning | **Terraform** (this repo) | Infra-as-code; mirrors `pbs.tf` pattern. In-container install is a scripted console step. |
|
||||
| Provisioning | **Terraform (container only) + console for bind mounts** | TF mirrors `pbs.tf` for the container. **Bind mounts cannot be created via API token** (Proxmox restricts them to `root@pam`/SSH), so `mp0/mp1` are added via console `pct set` — same method already used for jellyfin(115)/tos-api(700). `terraform import` of the mounts is a follow-up. |
|
||||
| Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** |
|
||||
| Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. |
|
||||
| Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. |
|
||||
@@ -79,7 +79,7 @@ and generated files on the host's large disks via direct bind mounts.
|
||||
| VMID | `118` (adjacent to litellm `117`, AI group) |
|
||||
| Node | `gihyeon` |
|
||||
| Type | unprivileged LXC, Debian 12 |
|
||||
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) |
|
||||
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) — **set via console `pct set`**, not TF (API token can't set host-security features) |
|
||||
| CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) |
|
||||
| rootfs | 24 GB on `local-lvm` |
|
||||
| Network | `eth0` on bridge `intra01`, IPv4 DHCP |
|
||||
@@ -92,10 +92,13 @@ and generated files on the host's large disks via direct bind mounts.
|
||||
| `mp0` | `/mnt/pve/hdd/hermes` | `/data` | 14TB bulk: code, artifacts, downloads |
|
||||
| `mp1` | `/media/2tb/hermes` | `/fast` | SSD: fast workspace / builds |
|
||||
|
||||
bpg `mount_point` blocks use an absolute host path as `volume` to create a bind
|
||||
mount. Both container paths are passed into the Hermes Docker container as
|
||||
volumes so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`,
|
||||
**Bind mounts are NOT in Terraform.** The Proxmox API token cannot create bind
|
||||
mounts (root@pam/SSH only), so `mp0/mp1` are added in the console with
|
||||
`pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast`.
|
||||
Both container paths are then passed into the Hermes Docker container as volumes
|
||||
so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`,
|
||||
small/fast config + memory + sqlite) stays on rootfs (SSD), **not** on the bulk disk.
|
||||
A `terraform import` of these mount points is tracked as a follow-up (same as 115/700).
|
||||
|
||||
### Unprivileged UID mapping (critical)
|
||||
Unlike jellyfin(115)/tos-api(700) — which are *privileged* (root→root, no perms
|
||||
@@ -124,23 +127,32 @@ subtree is remapped** (isolation preserved), not the whole disk.
|
||||
- Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.
|
||||
|
||||
## 8. Provisioning sequence (order matters)
|
||||
1. **Host prep** (node1 web console, once): bind-mount targets must exist before `apply`.
|
||||
1. **Host prep** (node1 web console, once): create + chown bind-mount targets.
|
||||
```sh
|
||||
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
|
||||
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
|
||||
```
|
||||
2. **Terraform apply** (from workstation): creates LXC #118 + bind mounts.
|
||||
3. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —
|
||||
install Docker + compose plugin → write `docker-compose.yml` + `config.yaml`
|
||||
pointing at litellm → fill `.env` (litellm key, bot tokens) → `hermes setup`
|
||||
→ `gateway run`.
|
||||
2. **Terraform apply** (from workstation): creates LXC #118 token-safe skeleton
|
||||
(rootfs, network, cpu/mem, unprivileged, onboot). **No features, no bind mounts**
|
||||
(API-token can't set host-security settings).
|
||||
3. **Apply features + bind mounts** (node1 console, once): use `pct set`:
|
||||
```sh
|
||||
pct set 118 -features nesting=1,keyctl=1 \
|
||||
-mp0 /mnt/pve/hdd/hermes,mp=/data \
|
||||
-mp1 /media/2tb/hermes,mp=/fast
|
||||
pct reboot 118
|
||||
```
|
||||
4. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —
|
||||
install Docker (rootful) + compose plugin → write `docker-compose.yml` +
|
||||
`config.yaml` pointing at litellm → fill `.env` (litellm key, bot tokens) →
|
||||
`hermes setup` → `gateway run`.
|
||||
|
||||
> In-container / host shell work is performed by the user via the **PVE web
|
||||
> console** (per `proxmox-access` memory — host SSH intentionally unused).
|
||||
|
||||
## 9. Repo changes
|
||||
- **New:** `hermes.tf` (download template + container resource + bind mounts),
|
||||
`hermes-variables.tf`, `scripts/hermes-bootstrap.sh`.
|
||||
- **New:** `hermes.tf` (container resource — **no bind mounts**),
|
||||
`hermes-variables.tf`, `scripts/hermes-bootstrap.sh` (host prep + `pct set` mounts + Docker/hermes install).
|
||||
- **Modified:** `terraform.tfvars` + `terraform.tfvars.example` (hermes vars),
|
||||
`outputs.tf` (VMID / IP), `README.md` (install steps), `gitignore` (ensure `.env` / secrets excluded).
|
||||
|
||||
@@ -152,7 +164,8 @@ subtree is remapped** (isolation preserved), not the whole disk.
|
||||
- Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start `local`.
|
||||
- Static IP instead of DHCP — deferred (DHCP matches litellm).
|
||||
- Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
|
||||
- `terraform import` of existing 115/700 mount points — tracked separately in `nfs-lxc-sharing-redesign`.
|
||||
- `terraform import` of the hermes `mp0/mp1` bind mounts into TF state — follow-up (same pattern as 115/700 in `nfs-lxc-sharing-redesign`).
|
||||
- Use **rootful** Docker in the LXC (not rootless): Hermes' gateway↔dashboard talk over localhost in one container, so a single netns is required. The ZFS overlay2→vfs caveat from public writeups does not apply here (storage is LVM-thin/ext4/dir, not ZFS).
|
||||
|
||||
## 12. Rollback
|
||||
- `terraform destroy -target` the hermes container, or `pct destroy 118`.
|
||||
|
||||
Reference in New Issue
Block a user