From 8938c486dc793d767180b3259ad056e14b66a678 Mon Sep 17 00:00:00 2001
From: 21in7 <skawksmsduzo@gmail.com>
Date: Thu, 18 Jun 2026 23:34:53 +0900
Subject: [PATCH] docs: add Hermes Agent LXC design spec

Design for deploying Nous Research Hermes Agent as an unprivileged Docker
LXC (#118) on node1, using litellm (10.1.10.22:4000) as the OpenAI-compatible
LLM gateway. Messaging-connector use (outbound-only, no inbound ports).
Large workspace via direct host bind mounts (hdd /data + 2tb /fast),
aligned with the Plan A same-host bind-mount decision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 .../2026-06-18-hermes-agent-lxc-design.md     | 165 ++++++++++++++++++
 1 file changed, 165 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md
diff --git a/docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md b/docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md
new file mode 100644
index 0000000..eb5d942
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md
@@ -0,0 +1,165 @@
+# Hermes Agent LXC — Design Spec
+
+- **Date:** 2026-06-18
+- **Author:** gihyeon (with Claude Code)
+- **Status:** Approved design → ready for implementation plan
+- **Repo:** `proxmox-iac` (Terraform / bpg/proxmox provider)
+
+## 1. Goal
+
+Deploy [Hermes Agent](https://hermes-agent.nousresearch.com/) (Nous Research,
+open-source MIT agent platform) as a new container on **node1 (`gihyeon`)**, using
+the existing **litellm** LXC as its LLM gateway. Primary use is **messaging
+connectors** (Telegram / Discord / Slack). The agent must be able to store code
+and generated files on the host's large disks via direct bind mounts.
+
+## 2. Context (verified 2026-06-18 via Proxmox API)
+
+### litellm LXC (existing)
+| Item | Value |
+|---|---|
+| VMID / host | `117` / `gihyeon` (node1) |
+| Spec | 2 core / 2GB RAM / 4GB disk (`hdd`) |
+| Network | SDN vnet `intra01`, IP `10.1.10.22/24` (DHCP) |
+| Endpoint | LiteLLM proxy, default port `4000` → `http://10.1.10.22:4000` |
+| Type | unprivileged LXC, Debian, community-script install, `nesting=1` |
+
+### node1 (`gihyeon`) headroom
+- CPU 12 threads / RAM 64GB (~32GB free)
+- Storage: `local-lvm` 93GB free (SSD/LVM-thin), `hdd` 10TB free, `media` 1.3TB free
+- intra01 has internet egress (litellm was installed from the internet and shows outbound traffic)
+
+### Storage host paths
+| Proxmox storage | Host path | Disk | Free |
+|---|---|---|---|
+| `media` | `/media/2tb` | nvme (SSD) | 1.3TB |
+| `hdd` | `/mnt/pve/hdd` | bulk | 10TB |
+
+### Hermes Agent facts (from official docs)
+- Two install paths: **Docker image** `nousresearch/hermes-agent` (compose provided) or native `install.sh` (uv/python3.11/node/ripgrep/ffmpeg).
+- LLM connection: supports **OpenAI-compatible `base_url`** → `provider: custom`, `base_url: <litellm>`. Config in `~/.hermes/config.yaml`, secrets in `~/.hermes/.env`.
+- Ports: `8642` (gateway API, OpenAI-compatible), `9119` (web dashboard). **Neither required for messaging-only use.**
+- Resources: min 1C/1GB, **recommended 2C/2–4GB / 2GB+ disk**. Browser tools want `--shm-size=1g`.
+- **Not privileged by default.** Subagent sandbox backends: local / Docker / SSH / Singularity / Modal. Docker sandbox needs `/var/run/docker.sock` (DinD) — **not used here**; we start with `sandbox=local`.
+- Single data mount inside the image: `/opt/data` (maps to host `~/.hermes`): config, sessions, memories, skills, logs, credentials.
+
+## 3. Decisions
+
+| Decision | Choice | Rationale |
+|---|---|---|
+| Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. |
+| Provisioning | **Terraform** (this repo) | Infra-as-code; mirrors `pbs.tf` pattern. In-container install is a scripted console step. |
+| Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** |
+| Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. |
+| Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. |
+
+## 4. Architecture
+
+```
+[Messaging platforms]          node1 (gihyeon) / intra01 (10.1.10.0/24)
+ Telegram/Discord  ──outbound──▶  ┌────────────────────────────────┐
+ /Slack ...                       │  hermes LXC #118 (unpriv+Docker)│
+                                  │   └ nousresearch/hermes-agent   │
+                                  │      (compose, sandbox=local)   │
+                                  │   /data  ◀─ bind /mnt/pve/hdd/hermes
+                                  │   /fast  ◀─ bind /media/2tb/hermes
+                                  └──────────┬─────────────────────┘
+                                             │ LLM (OpenAI-compatible)
+                                             ▼
+                                  litellm LXC #117 (10.1.10.22:4000)
+                                             │ routes to upstream providers
+                                             ▼
+                                   Anthropic / OpenAI / local / ...
+```
+
+## 5. Container spec (Terraform, bpg provider)
+
+| Field | Value |
+|---|---|
+| VMID | `118` (adjacent to litellm `117`, AI group) |
+| Node | `gihyeon` |
+| Type | unprivileged LXC, Debian 12 |
+| Features | `nesting = 1`, `keyctl = 1` (required for Docker) |
+| CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) |
+| rootfs | 24 GB on `local-lvm` |
+| Network | `eth0` on bridge `intra01`, IPv4 DHCP |
+| Options | `start_on_boot = true`, tags `ai;agent;terraform` |
+| Hostname | `hermes` |
+
+### Bind mounts (large workspace)
+| mount | Host path | Container path | Purpose |
+|---|---|---|---|
+| `mp0` | `/mnt/pve/hdd/hermes` | `/data` | 14TB bulk: code, artifacts, downloads |
+| `mp1` | `/media/2tb/hermes` | `/fast` | SSD: fast workspace / builds |
+
+bpg `mount_point` blocks use an absolute host path as `volume` to create a bind
+mount. Both container paths are passed into the Hermes Docker container as
+volumes so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`,
+small/fast config + memory + sqlite) stays on rootfs (SSD), **not** on the bulk disk.
+
+### Unprivileged UID mapping (critical)
+Unlike jellyfin(115)/tos-api(700) — which are *privileged* (root→root, no perms
+issue) — hermes is **unprivileged**, so its root maps to host UID `100000`. The
+bind-mount host directories must be owned by the mapped root. A dedicated
+subdirectory per disk (`…/hermes`) is `chown 100000:100000`, so **only that
+subtree is remapped** (isolation preserved), not the whole disk.
+
+## 6. Networking & security
+- On `intra01` (same subnet as litellm) → reaches `10.1.10.22:4000` directly.
+- Messaging connectors poll outbound → **no inbound port forwarding / no firewall opening.**
+- Dashboard (`9119`) and gateway API (`8642`) **not exposed**. If first-time setup needs the dashboard, use it transiently via console / temporary port-forward, or `HERMES_DASHBOARD_INSECURE=1` on the trusted net.
+- Secrets (litellm key, bot tokens) live only in the container's `~/.hermes/.env`; **never committed**.
+
+## 7. Software stack & LLM connection
+- Docker + docker-compose-plugin installed in the LXC.
+- `nousresearch/hermes-agent` run via compose (`gateway run`), `restart: unless-stopped`.
+- `~/.hermes/config.yaml`:
+  ```yaml
+  model:
+    default: <model name exposed by litellm>
+    provider: custom
+    base_url: http://10.1.10.22:4000/v1
+  ```
+- `~/.hermes/.env`: litellm API key (`OPENAI_API_KEY`), messaging bot tokens.
+- Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.
+
+## 8. Provisioning sequence (order matters)
+1. **Host prep** (node1 web console, once): bind-mount targets must exist before `apply`.
+   ```sh
+   mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
+   chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
+   ```
+2. **Terraform apply** (from workstation): creates LXC #118 + bind mounts.
+3. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —
+   install Docker + compose plugin → write `docker-compose.yml` + `config.yaml`
+   pointing at litellm → fill `.env` (litellm key, bot tokens) → `hermes setup`
+   → `gateway run`.
+
+> In-container / host shell work is performed by the user via the **PVE web
+> console** (per `proxmox-access` memory — host SSH intentionally unused).
+
+## 9. Repo changes
+- **New:** `hermes.tf` (download template + container resource + bind mounts),
+  `hermes-variables.tf`, `scripts/hermes-bootstrap.sh`.
+- **Modified:** `terraform.tfvars` + `terraform.tfvars.example` (hermes vars),
+  `outputs.tf` (VMID / IP), `README.md` (install steps), `gitignore` (ensure `.env` / secrets excluded).
+
+## 10. Values to fill at setup time
+- litellm master/virtual key and the exact **model name** litellm exposes.
+- Messaging bot tokens (Telegram / Discord / Slack as chosen).
+
+## 11. Out of scope / future
+- Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start `local`.
+- Static IP instead of DHCP — deferred (DHCP matches litellm).
+- Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
+- `terraform import` of existing 115/700 mount points — tracked separately in `nfs-lxc-sharing-redesign`.
+
+## 12. Rollback
+- `terraform destroy -target` the hermes container, or `pct destroy 118`.
+- Bind-mount host dirs (`/mnt/pve/hdd/hermes`, `/media/2tb/hermes`) remain unless manually removed.
+
+## 13. Verification (post-deploy)
+- LXC 118 running; `pct config 118` shows mp0/mp1 + `nesting=1`.
+- Inside container: `/data` and `/fast` writable by container root; `docker ps` shows hermes healthy.
+- Hermes can call litellm: a test prompt routes through `10.1.10.22:4000` and returns.
+- A messaging connector responds end-to-end; agent-written file appears under `/mnt/pve/hdd/hermes` on the host.