Files

21in7 8938c486dc docs: add Hermes Agent LXC design spec

Design for deploying Nous Research Hermes Agent as an unprivileged Docker
LXC (#118) on node1, using litellm (10.1.10.22:4000) as the OpenAI-compatible
LLM gateway. Messaging-connector use (outbound-only, no inbound ports).
Large workspace via direct host bind mounts (hdd /data + 2tb /fast),
aligned with the Plan A same-host bind-mount decision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-18 23:34:53 +09:00

9.0 KiB

Raw Blame History

Hermes Agent LXC — Design Spec

Date: 2026-06-18
Author: gihyeon (with Claude Code)
Status: Approved design → ready for implementation plan
Repo: proxmox-iac (Terraform / bpg/proxmox provider)

1. Goal

Deploy Hermes Agent (Nous Research, open-source MIT agent platform) as a new container on node1 (gihyeon), using the existing litellm LXC as its LLM gateway. Primary use is messaging connectors (Telegram / Discord / Slack). The agent must be able to store code and generated files on the host's large disks via direct bind mounts.

2. Context (verified 2026-06-18 via Proxmox API)

litellm LXC (existing)

Item	Value
VMID / host	`117` / `gihyeon` (node1)
Spec	2 core / 2GB RAM / 4GB disk (`hdd`)
Network	SDN vnet `intra01`, IP `10.1.10.22/24` (DHCP)
Endpoint	LiteLLM proxy, default port `4000` → `http://10.1.10.22:4000`
Type	unprivileged LXC, Debian, community-script install, `nesting=1`

node1 (`gihyeon`) headroom

CPU 12 threads / RAM 64GB (~32GB free)
Storage: local-lvm 93GB free (SSD/LVM-thin), hdd 10TB free, media 1.3TB free
intra01 has internet egress (litellm was installed from the internet and shows outbound traffic)

Storage host paths

Proxmox storage	Host path	Disk	Free
`media`	`/media/2tb`	nvme (SSD)	1.3TB
`hdd`	`/mnt/pve/hdd`	bulk	10TB

Hermes Agent facts (from official docs)

Two install paths: Docker image nousresearch/hermes-agent (compose provided) or native install.sh (uv/python3.11/node/ripgrep/ffmpeg).
LLM connection: supports OpenAI-compatible base_url → provider: custom, base_url: <litellm>. Config in ~/.hermes/config.yaml, secrets in ~/.hermes/.env.
Ports: 8642 (gateway API, OpenAI-compatible), 9119 (web dashboard). Neither required for messaging-only use.
Resources: min 1C/1GB, recommended 2C/2–4GB / 2GB+ disk. Browser tools want --shm-size=1g.
Not privileged by default. Subagent sandbox backends: local / Docker / SSH / Singularity / Modal. Docker sandbox needs /var/run/docker.sock (DinD) — not used here; we start with sandbox=local.
Single data mount inside the image: /opt/data (maps to host ~/.hermes): config, sessions, memories, skills, logs, credentials.

3. Decisions

Decision	Choice	Rationale
Deployment form	Docker LXC (unprivileged)	Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode.
Provisioning	Terraform (this repo)	Infra-as-code; mirrors `pbs.tf` pattern. In-container install is a scripted console step.
Primary interface	Messaging connectors	Outbound-only → zero inbound ports exposed.
Subagent sandbox	local	Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed.
Large workspace	Direct host bind mount (both disks)	Aligns with the user's Plan A (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory.

4. Architecture

[Messaging platforms]          node1 (gihyeon) / intra01 (10.1.10.0/24)
 Telegram/Discord  ──outbound──▶  ┌────────────────────────────────┐
 /Slack ...                       │  hermes LXC #118 (unpriv+Docker)│
                                  │   └ nousresearch/hermes-agent   │
                                  │      (compose, sandbox=local)   │
                                  │   /data  ◀─ bind /mnt/pve/hdd/hermes
                                  │   /fast  ◀─ bind /media/2tb/hermes
                                  └──────────┬─────────────────────┘
                                             │ LLM (OpenAI-compatible)
                                             ▼
                                  litellm LXC #117 (10.1.10.22:4000)
                                             │ routes to upstream providers
                                             ▼
                                   Anthropic / OpenAI / local / ...

5. Container spec (Terraform, bpg provider)

Field	Value
VMID	`118` (adjacent to litellm `117`, AI group)
Node	`gihyeon`
Type	unprivileged LXC, Debian 12
Features	`nesting = 1`, `keyctl = 1` (required for Docker)
CPU / RAM	2 cores / 4096 MB dedicated (+512 MB swap)
rootfs	24 GB on `local-lvm`
Network	`eth0` on bridge `intra01`, IPv4 DHCP
Options	`start_on_boot = true`, tags `ai;agent;terraform`
Hostname	`hermes`

Bind mounts (large workspace)

mount	Host path	Container path	Purpose
`mp0`	`/mnt/pve/hdd/hermes`	`/data`	14TB bulk: code, artifacts, downloads
`mp1`	`/media/2tb/hermes`	`/fast`	SSD: fast workspace / builds

bpg mount_point blocks use an absolute host path as volume to create a bind mount. Both container paths are passed into the Hermes Docker container as volumes so the agent's outputs land on the large disks. ~/.hermes (/opt/data, small/fast config + memory + sqlite) stays on rootfs (SSD), not on the bulk disk.

Unprivileged UID mapping (critical)

Unlike jellyfin(115)/tos-api(700) — which are privileged (root→root, no perms issue) — hermes is unprivileged, so its root maps to host UID 100000. The bind-mount host directories must be owned by the mapped root. A dedicated subdirectory per disk (…/hermes) is chown 100000:100000, so only that subtree is remapped (isolation preserved), not the whole disk.

6. Networking & security

On intra01 (same subnet as litellm) → reaches 10.1.10.22:4000 directly.
Messaging connectors poll outbound → no inbound port forwarding / no firewall opening.
Dashboard (9119) and gateway API (8642) not exposed. If first-time setup needs the dashboard, use it transiently via console / temporary port-forward, or HERMES_DASHBOARD_INSECURE=1 on the trusted net.
Secrets (litellm key, bot tokens) live only in the container's ~/.hermes/.env; never committed.

7. Software stack & LLM connection

Docker + docker-compose-plugin installed in the LXC.
nousresearch/hermes-agent run via compose (gateway run), restart: unless-stopped.

~/.hermes/config.yaml:

model:
  default: <model name exposed by litellm>
  provider: custom
  base_url: http://10.1.10.22:4000/v1

~/.hermes/.env: litellm API key (OPENAI_API_KEY), messaging bot tokens.
Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.

8. Provisioning sequence (order matters)

Host prep (node1 web console, once): bind-mount targets must exist before apply.

mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes

Terraform apply (from workstation): creates LXC #118 + bind mounts.
Container bootstrap (LXC console, once): scripts/hermes-bootstrap.sh — install Docker + compose plugin → write docker-compose.yml + config.yaml pointing at litellm → fill .env (litellm key, bot tokens) → hermes setup → gateway run.

In-container / host shell work is performed by the user via the PVE web console (per proxmox-access memory — host SSH intentionally unused).

9. Repo changes

New: hermes.tf (download template + container resource + bind mounts), hermes-variables.tf, scripts/hermes-bootstrap.sh.
Modified: terraform.tfvars + terraform.tfvars.example (hermes vars), outputs.tf (VMID / IP), README.md (install steps), gitignore (ensure .env / secrets excluded).

10. Values to fill at setup time

litellm master/virtual key and the exact model name litellm exposes.
Messaging bot tokens (Telegram / Discord / Slack as chosen).

11. Out of scope / future

Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start local.
Static IP instead of DHCP — deferred (DHCP matches litellm).
Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
terraform import of existing 115/700 mount points — tracked separately in nfs-lxc-sharing-redesign.

12. Rollback

terraform destroy -target the hermes container, or pct destroy 118.
Bind-mount host dirs (/mnt/pve/hdd/hermes, /media/2tb/hermes) remain unless manually removed.

13. Verification (post-deploy)

LXC 118 running; pct config 118 shows mp0/mp1 + nesting=1.
Inside container: /data and /fast writable by container root; docker ps shows hermes healthy.
Hermes can call litellm: a test prompt routes through 10.1.10.22:4000 and returns.
A messaging connector responds end-to-end; agent-written file appears under /mnt/pve/hdd/hermes on the host.

9.0 KiB Raw Blame History Unescape Escape