Plan: 10 tasks splitting workstation Terraform (token-safe container skeleton) from PVE-console host ops (features nesting/keyctl + bind mounts via pct set, which the API token cannot do) and in-container Docker/hermes bootstrap. Spec amended for the discovered API-token limitation: bind mounts AND container features require root@pam/SSH, so both are applied via console pct set rather than Terraform; terraform import tracked as follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
179 lines
10 KiB
Markdown
179 lines
10 KiB
Markdown
# Hermes Agent LXC — Design Spec
|
||
|
||
- **Date:** 2026-06-18
|
||
- **Author:** gihyeon (with Claude Code)
|
||
- **Status:** Approved design → ready for implementation plan
|
||
- **Repo:** `proxmox-iac` (Terraform / bpg/proxmox provider)
|
||
|
||
## 1. Goal
|
||
|
||
Deploy [Hermes Agent](https://hermes-agent.nousresearch.com/) (Nous Research,
|
||
open-source MIT agent platform) as a new container on **node1 (`gihyeon`)**, using
|
||
the existing **litellm** LXC as its LLM gateway. Primary use is **messaging
|
||
connectors** (Telegram / Discord / Slack). The agent must be able to store code
|
||
and generated files on the host's large disks via direct bind mounts.
|
||
|
||
## 2. Context (verified 2026-06-18 via Proxmox API)
|
||
|
||
### litellm LXC (existing)
|
||
| Item | Value |
|
||
|---|---|
|
||
| VMID / host | `117` / `gihyeon` (node1) |
|
||
| Spec | 2 core / 2GB RAM / 4GB disk (`hdd`) |
|
||
| Network | SDN vnet `intra01`, IP `10.1.10.22/24` (DHCP) |
|
||
| Endpoint | LiteLLM proxy, default port `4000` → `http://10.1.10.22:4000` |
|
||
| Type | unprivileged LXC, Debian, community-script install, `nesting=1` |
|
||
|
||
### node1 (`gihyeon`) headroom
|
||
- CPU 12 threads / RAM 64GB (~32GB free)
|
||
- Storage: `local-lvm` 93GB free (SSD/LVM-thin), `hdd` 10TB free, `media` 1.3TB free
|
||
- intra01 has internet egress (litellm was installed from the internet and shows outbound traffic)
|
||
|
||
### Storage host paths
|
||
| Proxmox storage | Host path | Disk | Free |
|
||
|---|---|---|---|
|
||
| `media` | `/media/2tb` | nvme (SSD) | 1.3TB |
|
||
| `hdd` | `/mnt/pve/hdd` | bulk | 10TB |
|
||
|
||
### Hermes Agent facts (from official docs)
|
||
- Two install paths: **Docker image** `nousresearch/hermes-agent` (compose provided) or native `install.sh` (uv/python3.11/node/ripgrep/ffmpeg).
|
||
- LLM connection: supports **OpenAI-compatible `base_url`** → `provider: custom`, `base_url: <litellm>`. Config in `~/.hermes/config.yaml`, secrets in `~/.hermes/.env`.
|
||
- Ports: `8642` (gateway API, OpenAI-compatible), `9119` (web dashboard). **Neither required for messaging-only use.**
|
||
- Resources: min 1C/1GB, **recommended 2C/2–4GB / 2GB+ disk**. Browser tools want `--shm-size=1g`.
|
||
- **Not privileged by default.** Subagent sandbox backends: local / Docker / SSH / Singularity / Modal. Docker sandbox needs `/var/run/docker.sock` (DinD) — **not used here**; we start with `sandbox=local`.
|
||
- Single data mount inside the image: `/opt/data` (maps to host `~/.hermes`): config, sessions, memories, skills, logs, credentials.
|
||
|
||
## 3. Decisions
|
||
|
||
| Decision | Choice | Rationale |
|
||
|---|---|---|
|
||
| Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. |
|
||
| Provisioning | **Terraform (container only) + console for bind mounts** | TF mirrors `pbs.tf` for the container. **Bind mounts cannot be created via API token** (Proxmox restricts them to `root@pam`/SSH), so `mp0/mp1` are added via console `pct set` — same method already used for jellyfin(115)/tos-api(700). `terraform import` of the mounts is a follow-up. |
|
||
| Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** |
|
||
| Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. |
|
||
| Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. |
|
||
|
||
## 4. Architecture
|
||
|
||
```
|
||
[Messaging platforms] node1 (gihyeon) / intra01 (10.1.10.0/24)
|
||
Telegram/Discord ──outbound──▶ ┌────────────────────────────────┐
|
||
/Slack ... │ hermes LXC #118 (unpriv+Docker)│
|
||
│ └ nousresearch/hermes-agent │
|
||
│ (compose, sandbox=local) │
|
||
│ /data ◀─ bind /mnt/pve/hdd/hermes
|
||
│ /fast ◀─ bind /media/2tb/hermes
|
||
└──────────┬─────────────────────┘
|
||
│ LLM (OpenAI-compatible)
|
||
▼
|
||
litellm LXC #117 (10.1.10.22:4000)
|
||
│ routes to upstream providers
|
||
▼
|
||
Anthropic / OpenAI / local / ...
|
||
```
|
||
|
||
## 5. Container spec (Terraform, bpg provider)
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| VMID | `118` (adjacent to litellm `117`, AI group) |
|
||
| Node | `gihyeon` |
|
||
| Type | unprivileged LXC, Debian 12 |
|
||
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) — **set via console `pct set`**, not TF (API token can't set host-security features) |
|
||
| CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) |
|
||
| rootfs | 24 GB on `local-lvm` |
|
||
| Network | `eth0` on bridge `intra01`, IPv4 DHCP |
|
||
| Options | `start_on_boot = true`, tags `ai;agent;terraform` |
|
||
| Hostname | `hermes` |
|
||
|
||
### Bind mounts (large workspace)
|
||
| mount | Host path | Container path | Purpose |
|
||
|---|---|---|---|
|
||
| `mp0` | `/mnt/pve/hdd/hermes` | `/data` | 14TB bulk: code, artifacts, downloads |
|
||
| `mp1` | `/media/2tb/hermes` | `/fast` | SSD: fast workspace / builds |
|
||
|
||
**Bind mounts are NOT in Terraform.** The Proxmox API token cannot create bind
|
||
mounts (root@pam/SSH only), so `mp0/mp1` are added in the console with
|
||
`pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast`.
|
||
Both container paths are then passed into the Hermes Docker container as volumes
|
||
so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`,
|
||
small/fast config + memory + sqlite) stays on rootfs (SSD), **not** on the bulk disk.
|
||
A `terraform import` of these mount points is tracked as a follow-up (same as 115/700).
|
||
|
||
### Unprivileged UID mapping (critical)
|
||
Unlike jellyfin(115)/tos-api(700) — which are *privileged* (root→root, no perms
|
||
issue) — hermes is **unprivileged**, so its root maps to host UID `100000`. The
|
||
bind-mount host directories must be owned by the mapped root. A dedicated
|
||
subdirectory per disk (`…/hermes`) is `chown 100000:100000`, so **only that
|
||
subtree is remapped** (isolation preserved), not the whole disk.
|
||
|
||
## 6. Networking & security
|
||
- On `intra01` (same subnet as litellm) → reaches `10.1.10.22:4000` directly.
|
||
- Messaging connectors poll outbound → **no inbound port forwarding / no firewall opening.**
|
||
- Dashboard (`9119`) and gateway API (`8642`) **not exposed**. If first-time setup needs the dashboard, use it transiently via console / temporary port-forward, or `HERMES_DASHBOARD_INSECURE=1` on the trusted net.
|
||
- Secrets (litellm key, bot tokens) live only in the container's `~/.hermes/.env`; **never committed**.
|
||
|
||
## 7. Software stack & LLM connection
|
||
- Docker + docker-compose-plugin installed in the LXC.
|
||
- `nousresearch/hermes-agent` run via compose (`gateway run`), `restart: unless-stopped`.
|
||
- `~/.hermes/config.yaml`:
|
||
```yaml
|
||
model:
|
||
default: <model name exposed by litellm>
|
||
provider: custom
|
||
base_url: http://10.1.10.22:4000/v1
|
||
```
|
||
- `~/.hermes/.env`: litellm API key (`OPENAI_API_KEY`), messaging bot tokens.
|
||
- Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.
|
||
|
||
## 8. Provisioning sequence (order matters)
|
||
1. **Host prep** (node1 web console, once): create + chown bind-mount targets.
|
||
```sh
|
||
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
|
||
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
|
||
```
|
||
2. **Terraform apply** (from workstation): creates LXC #118 token-safe skeleton
|
||
(rootfs, network, cpu/mem, unprivileged, onboot). **No features, no bind mounts**
|
||
(API-token can't set host-security settings).
|
||
3. **Apply features + bind mounts** (node1 console, once): use `pct set`:
|
||
```sh
|
||
pct set 118 -features nesting=1,keyctl=1 \
|
||
-mp0 /mnt/pve/hdd/hermes,mp=/data \
|
||
-mp1 /media/2tb/hermes,mp=/fast
|
||
pct reboot 118
|
||
```
|
||
4. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —
|
||
install Docker (rootful) + compose plugin → write `docker-compose.yml` +
|
||
`config.yaml` pointing at litellm → fill `.env` (litellm key, bot tokens) →
|
||
`hermes setup` → `gateway run`.
|
||
|
||
> In-container / host shell work is performed by the user via the **PVE web
|
||
> console** (per `proxmox-access` memory — host SSH intentionally unused).
|
||
|
||
## 9. Repo changes
|
||
- **New:** `hermes.tf` (container resource — **no bind mounts**),
|
||
`hermes-variables.tf`, `scripts/hermes-bootstrap.sh` (host prep + `pct set` mounts + Docker/hermes install).
|
||
- **Modified:** `terraform.tfvars` + `terraform.tfvars.example` (hermes vars),
|
||
`outputs.tf` (VMID / IP), `README.md` (install steps), `gitignore` (ensure `.env` / secrets excluded).
|
||
|
||
## 10. Values to fill at setup time
|
||
- litellm master/virtual key and the exact **model name** litellm exposes.
|
||
- Messaging bot tokens (Telegram / Discord / Slack as chosen).
|
||
|
||
## 11. Out of scope / future
|
||
- Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start `local`.
|
||
- Static IP instead of DHCP — deferred (DHCP matches litellm).
|
||
- Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
|
||
- `terraform import` of the hermes `mp0/mp1` bind mounts into TF state — follow-up (same pattern as 115/700 in `nfs-lxc-sharing-redesign`).
|
||
- Use **rootful** Docker in the LXC (not rootless): Hermes' gateway↔dashboard talk over localhost in one container, so a single netns is required. The ZFS overlay2→vfs caveat from public writeups does not apply here (storage is LVM-thin/ext4/dir, not ZFS).
|
||
|
||
## 12. Rollback
|
||
- `terraform destroy -target` the hermes container, or `pct destroy 118`.
|
||
- Bind-mount host dirs (`/mnt/pve/hdd/hermes`, `/media/2tb/hermes`) remain unless manually removed.
|
||
|
||
## 13. Verification (post-deploy)
|
||
- LXC 118 running; `pct config 118` shows mp0/mp1 + `nesting=1`.
|
||
- Inside container: `/data` and `/fast` writable by container root; `docker ps` shows hermes healthy.
|
||
- Hermes can call litellm: a test prompt routes through `10.1.10.22:4000` and returns.
|
||
- A messaging connector responds end-to-end; agent-written file appears under `/mnt/pve/hdd/hermes` on the host.
|