Files
proxmox-iac/docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md
21in7 8938c486dc docs: add Hermes Agent LXC design spec
Design for deploying Nous Research Hermes Agent as an unprivileged Docker
LXC (#118) on node1, using litellm (10.1.10.22:4000) as the OpenAI-compatible
LLM gateway. Messaging-connector use (outbound-only, no inbound ports).
Large workspace via direct host bind mounts (hdd /data + 2tb /fast),
aligned with the Plan A same-host bind-mount decision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 23:34:53 +09:00

9.0 KiB
Raw Blame History

Hermes Agent LXC — Design Spec

  • Date: 2026-06-18
  • Author: gihyeon (with Claude Code)
  • Status: Approved design → ready for implementation plan
  • Repo: proxmox-iac (Terraform / bpg/proxmox provider)

1. Goal

Deploy Hermes Agent (Nous Research, open-source MIT agent platform) as a new container on node1 (gihyeon), using the existing litellm LXC as its LLM gateway. Primary use is messaging connectors (Telegram / Discord / Slack). The agent must be able to store code and generated files on the host's large disks via direct bind mounts.

2. Context (verified 2026-06-18 via Proxmox API)

litellm LXC (existing)

Item Value
VMID / host 117 / gihyeon (node1)
Spec 2 core / 2GB RAM / 4GB disk (hdd)
Network SDN vnet intra01, IP 10.1.10.22/24 (DHCP)
Endpoint LiteLLM proxy, default port 4000http://10.1.10.22:4000
Type unprivileged LXC, Debian, community-script install, nesting=1

node1 (gihyeon) headroom

  • CPU 12 threads / RAM 64GB (~32GB free)
  • Storage: local-lvm 93GB free (SSD/LVM-thin), hdd 10TB free, media 1.3TB free
  • intra01 has internet egress (litellm was installed from the internet and shows outbound traffic)

Storage host paths

Proxmox storage Host path Disk Free
media /media/2tb nvme (SSD) 1.3TB
hdd /mnt/pve/hdd bulk 10TB

Hermes Agent facts (from official docs)

  • Two install paths: Docker image nousresearch/hermes-agent (compose provided) or native install.sh (uv/python3.11/node/ripgrep/ffmpeg).
  • LLM connection: supports OpenAI-compatible base_urlprovider: custom, base_url: <litellm>. Config in ~/.hermes/config.yaml, secrets in ~/.hermes/.env.
  • Ports: 8642 (gateway API, OpenAI-compatible), 9119 (web dashboard). Neither required for messaging-only use.
  • Resources: min 1C/1GB, recommended 2C/24GB / 2GB+ disk. Browser tools want --shm-size=1g.
  • Not privileged by default. Subagent sandbox backends: local / Docker / SSH / Singularity / Modal. Docker sandbox needs /var/run/docker.sock (DinD) — not used here; we start with sandbox=local.
  • Single data mount inside the image: /opt/data (maps to host ~/.hermes): config, sessions, memories, skills, logs, credentials.

3. Decisions

Decision Choice Rationale
Deployment form Docker LXC (unprivileged) Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode.
Provisioning Terraform (this repo) Infra-as-code; mirrors pbs.tf pattern. In-container install is a scripted console step.
Primary interface Messaging connectors Outbound-only → zero inbound ports exposed.
Subagent sandbox local Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed.
Large workspace Direct host bind mount (both disks) Aligns with the user's Plan A (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See nfs-lxc-sharing-redesign memory.

4. Architecture

[Messaging platforms]          node1 (gihyeon) / intra01 (10.1.10.0/24)
 Telegram/Discord  ──outbound──▶  ┌────────────────────────────────┐
 /Slack ...                       │  hermes LXC #118 (unpriv+Docker)│
                                  │   └ nousresearch/hermes-agent   │
                                  │      (compose, sandbox=local)   │
                                  │   /data  ◀─ bind /mnt/pve/hdd/hermes
                                  │   /fast  ◀─ bind /media/2tb/hermes
                                  └──────────┬─────────────────────┘
                                             │ LLM (OpenAI-compatible)
                                             ▼
                                  litellm LXC #117 (10.1.10.22:4000)
                                             │ routes to upstream providers
                                             ▼
                                   Anthropic / OpenAI / local / ...

5. Container spec (Terraform, bpg provider)

Field Value
VMID 118 (adjacent to litellm 117, AI group)
Node gihyeon
Type unprivileged LXC, Debian 12
Features nesting = 1, keyctl = 1 (required for Docker)
CPU / RAM 2 cores / 4096 MB dedicated (+512 MB swap)
rootfs 24 GB on local-lvm
Network eth0 on bridge intra01, IPv4 DHCP
Options start_on_boot = true, tags ai;agent;terraform
Hostname hermes

Bind mounts (large workspace)

mount Host path Container path Purpose
mp0 /mnt/pve/hdd/hermes /data 14TB bulk: code, artifacts, downloads
mp1 /media/2tb/hermes /fast SSD: fast workspace / builds

bpg mount_point blocks use an absolute host path as volume to create a bind mount. Both container paths are passed into the Hermes Docker container as volumes so the agent's outputs land on the large disks. ~/.hermes (/opt/data, small/fast config + memory + sqlite) stays on rootfs (SSD), not on the bulk disk.

Unprivileged UID mapping (critical)

Unlike jellyfin(115)/tos-api(700) — which are privileged (root→root, no perms issue) — hermes is unprivileged, so its root maps to host UID 100000. The bind-mount host directories must be owned by the mapped root. A dedicated subdirectory per disk (…/hermes) is chown 100000:100000, so only that subtree is remapped (isolation preserved), not the whole disk.

6. Networking & security

  • On intra01 (same subnet as litellm) → reaches 10.1.10.22:4000 directly.
  • Messaging connectors poll outbound → no inbound port forwarding / no firewall opening.
  • Dashboard (9119) and gateway API (8642) not exposed. If first-time setup needs the dashboard, use it transiently via console / temporary port-forward, or HERMES_DASHBOARD_INSECURE=1 on the trusted net.
  • Secrets (litellm key, bot tokens) live only in the container's ~/.hermes/.env; never committed.

7. Software stack & LLM connection

  • Docker + docker-compose-plugin installed in the LXC.
  • nousresearch/hermes-agent run via compose (gateway run), restart: unless-stopped.
  • ~/.hermes/config.yaml:
    model:
      default: <model name exposed by litellm>
      provider: custom
      base_url: http://10.1.10.22:4000/v1
    
  • ~/.hermes/.env: litellm API key (OPENAI_API_KEY), messaging bot tokens.
  • Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.

8. Provisioning sequence (order matters)

  1. Host prep (node1 web console, once): bind-mount targets must exist before apply.
    mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
    chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
    
  2. Terraform apply (from workstation): creates LXC #118 + bind mounts.
  3. Container bootstrap (LXC console, once): scripts/hermes-bootstrap.sh — install Docker + compose plugin → write docker-compose.yml + config.yaml pointing at litellm → fill .env (litellm key, bot tokens) → hermes setupgateway run.

In-container / host shell work is performed by the user via the PVE web console (per proxmox-access memory — host SSH intentionally unused).

9. Repo changes

  • New: hermes.tf (download template + container resource + bind mounts), hermes-variables.tf, scripts/hermes-bootstrap.sh.
  • Modified: terraform.tfvars + terraform.tfvars.example (hermes vars), outputs.tf (VMID / IP), README.md (install steps), gitignore (ensure .env / secrets excluded).

10. Values to fill at setup time

  • litellm master/virtual key and the exact model name litellm exposes.
  • Messaging bot tokens (Telegram / Discord / Slack as chosen).

11. Out of scope / future

  • Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start local.
  • Static IP instead of DHCP — deferred (DHCP matches litellm).
  • Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
  • terraform import of existing 115/700 mount points — tracked separately in nfs-lxc-sharing-redesign.

12. Rollback

  • terraform destroy -target the hermes container, or pct destroy 118.
  • Bind-mount host dirs (/mnt/pve/hdd/hermes, /media/2tb/hermes) remain unless manually removed.

13. Verification (post-deploy)

  • LXC 118 running; pct config 118 shows mp0/mp1 + nesting=1.
  • Inside container: /data and /fast writable by container root; docker ps shows hermes healthy.
  • Hermes can call litellm: a test prompt routes through 10.1.10.22:4000 and returns.
  • A messaging connector responds end-to-end; agent-written file appears under /mnt/pve/hdd/hermes on the host.