Files
proxmox-iac/docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md
21in7 8938c486dc docs: add Hermes Agent LXC design spec
Design for deploying Nous Research Hermes Agent as an unprivileged Docker
LXC (#118) on node1, using litellm (10.1.10.22:4000) as the OpenAI-compatible
LLM gateway. Messaging-connector use (outbound-only, no inbound ports).
Large workspace via direct host bind mounts (hdd /data + 2tb /fast),
aligned with the Plan A same-host bind-mount decision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 23:34:53 +09:00

166 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Hermes Agent LXC — Design Spec
- **Date:** 2026-06-18
- **Author:** gihyeon (with Claude Code)
- **Status:** Approved design → ready for implementation plan
- **Repo:** `proxmox-iac` (Terraform / bpg/proxmox provider)
## 1. Goal
Deploy [Hermes Agent](https://hermes-agent.nousresearch.com/) (Nous Research,
open-source MIT agent platform) as a new container on **node1 (`gihyeon`)**, using
the existing **litellm** LXC as its LLM gateway. Primary use is **messaging
connectors** (Telegram / Discord / Slack). The agent must be able to store code
and generated files on the host's large disks via direct bind mounts.
## 2. Context (verified 2026-06-18 via Proxmox API)
### litellm LXC (existing)
| Item | Value |
|---|---|
| VMID / host | `117` / `gihyeon` (node1) |
| Spec | 2 core / 2GB RAM / 4GB disk (`hdd`) |
| Network | SDN vnet `intra01`, IP `10.1.10.22/24` (DHCP) |
| Endpoint | LiteLLM proxy, default port `4000``http://10.1.10.22:4000` |
| Type | unprivileged LXC, Debian, community-script install, `nesting=1` |
### node1 (`gihyeon`) headroom
- CPU 12 threads / RAM 64GB (~32GB free)
- Storage: `local-lvm` 93GB free (SSD/LVM-thin), `hdd` 10TB free, `media` 1.3TB free
- intra01 has internet egress (litellm was installed from the internet and shows outbound traffic)
### Storage host paths
| Proxmox storage | Host path | Disk | Free |
|---|---|---|---|
| `media` | `/media/2tb` | nvme (SSD) | 1.3TB |
| `hdd` | `/mnt/pve/hdd` | bulk | 10TB |
### Hermes Agent facts (from official docs)
- Two install paths: **Docker image** `nousresearch/hermes-agent` (compose provided) or native `install.sh` (uv/python3.11/node/ripgrep/ffmpeg).
- LLM connection: supports **OpenAI-compatible `base_url`**`provider: custom`, `base_url: <litellm>`. Config in `~/.hermes/config.yaml`, secrets in `~/.hermes/.env`.
- Ports: `8642` (gateway API, OpenAI-compatible), `9119` (web dashboard). **Neither required for messaging-only use.**
- Resources: min 1C/1GB, **recommended 2C/24GB / 2GB+ disk**. Browser tools want `--shm-size=1g`.
- **Not privileged by default.** Subagent sandbox backends: local / Docker / SSH / Singularity / Modal. Docker sandbox needs `/var/run/docker.sock` (DinD) — **not used here**; we start with `sandbox=local`.
- Single data mount inside the image: `/opt/data` (maps to host `~/.hermes`): config, sessions, memories, skills, logs, credentials.
## 3. Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. |
| Provisioning | **Terraform** (this repo) | Infra-as-code; mirrors `pbs.tf` pattern. In-container install is a scripted console step. |
| Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** |
| Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. |
| Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. |
## 4. Architecture
```
[Messaging platforms] node1 (gihyeon) / intra01 (10.1.10.0/24)
Telegram/Discord ──outbound──▶ ┌────────────────────────────────┐
/Slack ... │ hermes LXC #118 (unpriv+Docker)│
│ └ nousresearch/hermes-agent │
│ (compose, sandbox=local) │
│ /data ◀─ bind /mnt/pve/hdd/hermes
│ /fast ◀─ bind /media/2tb/hermes
└──────────┬─────────────────────┘
│ LLM (OpenAI-compatible)
litellm LXC #117 (10.1.10.22:4000)
│ routes to upstream providers
Anthropic / OpenAI / local / ...
```
## 5. Container spec (Terraform, bpg provider)
| Field | Value |
|---|---|
| VMID | `118` (adjacent to litellm `117`, AI group) |
| Node | `gihyeon` |
| Type | unprivileged LXC, Debian 12 |
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) |
| CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) |
| rootfs | 24 GB on `local-lvm` |
| Network | `eth0` on bridge `intra01`, IPv4 DHCP |
| Options | `start_on_boot = true`, tags `ai;agent;terraform` |
| Hostname | `hermes` |
### Bind mounts (large workspace)
| mount | Host path | Container path | Purpose |
|---|---|---|---|
| `mp0` | `/mnt/pve/hdd/hermes` | `/data` | 14TB bulk: code, artifacts, downloads |
| `mp1` | `/media/2tb/hermes` | `/fast` | SSD: fast workspace / builds |
bpg `mount_point` blocks use an absolute host path as `volume` to create a bind
mount. Both container paths are passed into the Hermes Docker container as
volumes so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`,
small/fast config + memory + sqlite) stays on rootfs (SSD), **not** on the bulk disk.
### Unprivileged UID mapping (critical)
Unlike jellyfin(115)/tos-api(700) — which are *privileged* (root→root, no perms
issue) — hermes is **unprivileged**, so its root maps to host UID `100000`. The
bind-mount host directories must be owned by the mapped root. A dedicated
subdirectory per disk (`…/hermes`) is `chown 100000:100000`, so **only that
subtree is remapped** (isolation preserved), not the whole disk.
## 6. Networking & security
- On `intra01` (same subnet as litellm) → reaches `10.1.10.22:4000` directly.
- Messaging connectors poll outbound → **no inbound port forwarding / no firewall opening.**
- Dashboard (`9119`) and gateway API (`8642`) **not exposed**. If first-time setup needs the dashboard, use it transiently via console / temporary port-forward, or `HERMES_DASHBOARD_INSECURE=1` on the trusted net.
- Secrets (litellm key, bot tokens) live only in the container's `~/.hermes/.env`; **never committed**.
## 7. Software stack & LLM connection
- Docker + docker-compose-plugin installed in the LXC.
- `nousresearch/hermes-agent` run via compose (`gateway run`), `restart: unless-stopped`.
- `~/.hermes/config.yaml`:
```yaml
model:
default: <model name exposed by litellm>
provider: custom
base_url: http://10.1.10.22:4000/v1
```
- `~/.hermes/.env`: litellm API key (`OPENAI_API_KEY`), messaging bot tokens.
- Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.
## 8. Provisioning sequence (order matters)
1. **Host prep** (node1 web console, once): bind-mount targets must exist before `apply`.
```sh
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
```
2. **Terraform apply** (from workstation): creates LXC #118 + bind mounts.
3. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —
install Docker + compose plugin → write `docker-compose.yml` + `config.yaml`
pointing at litellm → fill `.env` (litellm key, bot tokens) → `hermes setup`
→ `gateway run`.
> In-container / host shell work is performed by the user via the **PVE web
> console** (per `proxmox-access` memory — host SSH intentionally unused).
## 9. Repo changes
- **New:** `hermes.tf` (download template + container resource + bind mounts),
`hermes-variables.tf`, `scripts/hermes-bootstrap.sh`.
- **Modified:** `terraform.tfvars` + `terraform.tfvars.example` (hermes vars),
`outputs.tf` (VMID / IP), `README.md` (install steps), `gitignore` (ensure `.env` / secrets excluded).
## 10. Values to fill at setup time
- litellm master/virtual key and the exact **model name** litellm exposes.
- Messaging bot tokens (Telegram / Discord / Slack as chosen).
## 11. Out of scope / future
- Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start `local`.
- Static IP instead of DHCP — deferred (DHCP matches litellm).
- Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
- `terraform import` of existing 115/700 mount points — tracked separately in `nfs-lxc-sharing-redesign`.
## 12. Rollback
- `terraform destroy -target` the hermes container, or `pct destroy 118`.
- Bind-mount host dirs (`/mnt/pve/hdd/hermes`, `/media/2tb/hermes`) remain unless manually removed.
## 13. Verification (post-deploy)
- LXC 118 running; `pct config 118` shows mp0/mp1 + `nesting=1`.
- Inside container: `/data` and `/fast` writable by container root; `docker ps` shows hermes healthy.
- Hermes can call litellm: a test prompt routes through `10.1.10.22:4000` and returns.
- A messaging connector responds end-to-end; agent-written file appears under `/mnt/pve/hdd/hermes` on the host.