docs: add Hermes Agent LXC implementation plan + spec amendments

Plan: 10 tasks splitting workstation Terraform (token-safe container skeleton)
from PVE-console host ops (features nesting/keyctl + bind mounts via pct set,
which the API token cannot do) and in-container Docker/hermes bootstrap.

Spec amended for the discovered API-token limitation: bind mounts AND container
features require root@pam/SSH, so both are applied via console pct set rather
than Terraform; terraform import tracked as follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
21in7
2026-06-18 23:42:27 +09:00
parent 8938c486dc
commit 92851a384f
2 changed files with 564 additions and 14 deletions

View File

@@ -0,0 +1,537 @@
# Hermes Agent LXC Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Deploy Nous Research Hermes Agent as an unprivileged Docker LXC (#118) on node1 (`gihyeon`), using the existing litellm LXC (`10.1.10.22:4000`) as its OpenAI-compatible LLM gateway, with large-disk bind mounts for the agent workspace.
**Architecture:** Terraform creates a token-safe LXC skeleton (rootfs, network, cpu/mem). Host-security settings the API token cannot set — container `features` (nesting/keyctl) and bind mounts — are applied once via the PVE web console with `pct set`. A bootstrap script then installs rootful Docker and runs the official `nousresearch/hermes-agent` image via compose, pointed at litellm, with `sandbox=local` and messaging connectors.
**Tech Stack:** Terraform (bpg/proxmox provider), Proxmox VE 9.1 LXC, Docker + docker-compose, Hermes Agent (Nous Research).
**Spec:** [docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md](../specs/2026-06-18-hermes-agent-lxc-design.md)
**Execution split:**
- **Workstation (Terraform):** Tasks 15, 8, 9 — run `terraform` against the API token.
- **PVE web console (user runs, pastes output back):** Tasks 6, 7 — host/in-container ops (per `proxmox-access`: host SSH is intentionally unused).
---
## File Structure
| File | Responsibility |
|---|---|
| `hermes-variables.tf` (new) | All Hermes LXC input variables with defaults |
| `hermes.tf` (new) | Debian12 template download (gihyeon) + token-safe container resource (no features, no mounts) |
| `terraform.tfvars` (modify) | Set Hermes values for this homelab |
| `terraform.tfvars.example` (modify) | Document Hermes values for other users |
| `outputs.tf` (modify) | Expose Hermes VMID + hostname |
| `scripts/hermes-bootstrap.sh` (new) | Host prep + `pct set` (features+mounts) + Docker install + compose + Hermes config (placeholders for secrets) |
| `README.md` (modify) | Document the 4-phase deploy flow |
---
## Task 1: Hermes input variables
**Files:**
- Create: `hermes-variables.tf`
- [ ] **Step 1: Write `hermes-variables.tf`**
```hcl
variable "hermes_vmid" {
description = "VMID for the Hermes Agent LXC"
type = number
default = 118
}
variable "hermes_hostname" {
description = "Hostname for the Hermes Agent LXC"
type = string
default = "hermes"
}
variable "hermes_node" {
description = "Proxmox node to host the Hermes Agent LXC"
type = string
default = "gihyeon"
}
variable "hermes_cores" {
description = "CPU cores for the Hermes Agent LXC"
type = number
default = 2
}
variable "hermes_memory" {
description = "Dedicated memory (MB) for the Hermes Agent LXC"
type = number
default = 4096
}
variable "hermes_swap" {
description = "Swap (MB) for the Hermes Agent LXC"
type = number
default = 512
}
variable "hermes_disk_size" {
description = "Root filesystem size (GB) for the Hermes Agent LXC"
type = number
default = 24
}
variable "hermes_datastore" {
description = "Datastore for the Hermes Agent LXC root filesystem"
type = string
default = "local-lvm"
}
variable "hermes_network_bridge" {
description = "Network bridge (SDN VNET) for the Hermes Agent LXC"
type = string
default = "intra01"
}
```
- [ ] **Step 2: Format + validate**
Run: `terraform fmt hermes-variables.tf && terraform validate`
Expected: `Success! The configuration is valid.` (validate may warn about the missing `hermes.tf` resource until Task 2 — that is fine; the goal here is no HCL syntax error in this file.)
- [ ] **Step 3: Commit**
```bash
git add hermes-variables.tf
git commit -m "feat: add Hermes Agent LXC variables"
```
---
## Task 2: Hermes container resource (token-safe skeleton)
**Files:**
- Create: `hermes.tf`
Reuses the existing `var.dns_servers` (defined in `pbs-variables.tf`).
- [ ] **Step 1: Write `hermes.tf`**
```hcl
# Download Debian 12 LXC template to gihyeon (node1).
resource "proxmox_virtual_environment_download_file" "debian12_template_gihyeon" {
content_type = "vztmpl"
datastore_id = "local"
node_name = var.hermes_node
url = "http://download.proxmox.com/images/system/debian-12-standard_12.12-1_amd64.tar.zst"
}
# Hermes Agent LXC — token-safe skeleton.
# IMPORTANT: container `features` (nesting/keyctl) and bind mounts are NOT set
# here. The Proxmox API token cannot set host-security settings; they are applied
# once via the PVE web console with `pct set` (see scripts/hermes-bootstrap.sh
# and docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md).
resource "proxmox_virtual_environment_container" "hermes" {
description = "Hermes Agent (Nous Research) - Managed by Terraform"
node_name = var.hermes_node
vm_id = var.hermes_vmid
start_on_boot = true
unprivileged = true
tags = ["ai", "agent", "terraform"]
operating_system {
template_file_id = proxmox_virtual_environment_download_file.debian12_template_gihyeon.id
type = "debian"
}
cpu {
cores = var.hermes_cores
}
memory {
dedicated = var.hermes_memory
swap = var.hermes_swap
}
disk {
datastore_id = var.hermes_datastore
size = var.hermes_disk_size
}
network_interface {
name = "eth0"
bridge = var.hermes_network_bridge
}
initialization {
hostname = var.hermes_hostname
ip_config {
ipv4 {
address = "dhcp"
}
}
dns {
servers = var.dns_servers
}
}
}
```
- [ ] **Step 2: Format + validate**
Run: `terraform fmt hermes.tf && terraform validate`
Expected: `Success! The configuration is valid.`
- [ ] **Step 3: Commit**
```bash
git add hermes.tf
git commit -m "feat: add Hermes Agent LXC container resource"
```
---
## Task 3: tfvars values
**Files:**
- Modify: `terraform.tfvars`
- Modify: `terraform.tfvars.example`
Defaults in `hermes-variables.tf` already match this homelab, so tfvars only needs an explicit override block for clarity/discoverability.
- [ ] **Step 1: Append to `terraform.tfvars`**
Add after the existing DNS line:
```hcl
# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid = 118
hermes_node = "gihyeon"
hermes_network_bridge = "intra01"
```
- [ ] **Step 2: Append the same block to `terraform.tfvars.example`**
```hcl
# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid = 118
hermes_node = "gihyeon"
hermes_network_bridge = "intra01"
```
- [ ] **Step 3: Validate**
Run: `terraform fmt && terraform validate`
Expected: `Success! The configuration is valid.`
- [ ] **Step 4: Commit**
```bash
git add terraform.tfvars terraform.tfvars.example
git commit -m "feat: set Hermes Agent LXC tfvars"
```
---
## Task 4: Outputs
**Files:**
- Modify: `outputs.tf`
- [ ] **Step 1: Append to `outputs.tf`**
```hcl
output "hermes_container_id" {
description = "Hermes Agent LXC container ID"
value = proxmox_virtual_environment_container.hermes.vm_id
}
output "hermes_hostname" {
description = "Hermes Agent LXC hostname (IP is DHCP-assigned; discover via PVE/API)"
value = var.hermes_hostname
}
```
- [ ] **Step 2: Validate**
Run: `terraform validate`
Expected: `Success! The configuration is valid.`
- [ ] **Step 3: Commit**
```bash
git add outputs.tf
git commit -m "feat: add Hermes Agent LXC outputs"
```
---
## Task 5: Plan + apply the container (workstation)
**Files:** none (infra apply)
- [ ] **Step 1: Review the plan**
Run: `terraform plan`
Expected: plan shows `2 to add``proxmox_virtual_environment_download_file.debian12_template_gihyeon` and `proxmox_virtual_environment_container.hermes`. **0 to change, 0 to destroy.** Confirm it does NOT touch `proxmox_virtual_environment_container.pbs`.
- [ ] **Step 2: Apply**
Run: `terraform apply`
Expected: `Apply complete! Resources: 2 added, 0 changed, 0 destroyed.` Outputs include `hermes_container_id = 118`.
> If apply errors with a permission/`root@pam`-only message on any container attribute, STOP — it means an attribute in `hermes.tf` is host-restricted. The skeleton here is intentionally limited to attributes the PBS container already created successfully via the same token, so this is not expected.
- [ ] **Step 3: Confirm via API (read-only)**
Run:
```bash
curl -sk -H "Authorization: PVEAPIToken=root@pam!terrform=1408ded5-c7c4-4384-8b19-64178837fb8c" \
"https://192.168.50.87:8006/api2/json/nodes/gihyeon/lxc/118/status/current" \
| python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(d['name'], d['status'])"
```
Expected: `hermes running` (or `stopped` — the container may not auto-start before features/mounts; Task 7 reboots it).
- [ ] **Step 4: Commit state**
```bash
git add terraform.tfstate terraform.tfstate.backup
git commit -m "chore: apply Hermes Agent LXC (state)"
```
---
## Task 6: Host prep — create + chown bind-mount targets (PVE console)
**Run in the node1 (`gihyeon`) shell via PVE web console. Paste output back.**
- [ ] **Step 1: Create the workspace dirs and chown to the unprivileged-mapped root**
```sh
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
ls -lnd /mnt/pve/hdd/hermes /media/2tb/hermes
```
Expected: both dirs exist and `ls -lnd` shows owner/group `100000 100000`.
---
## Task 7: Apply features + bind mounts, reboot (PVE console)
**Run in the node1 (`gihyeon`) shell via PVE web console. Paste output back.**
- [ ] **Step 1: Set features (Docker) and the two bind mounts**
```sh
pct set 118 -features nesting=1,keyctl=1 \
-mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct reboot 118
```
Expected: no error output from `pct set`; container reboots.
- [ ] **Step 2: Verify config + writable mounts**
```sh
pct config 118 | grep -E 'features|mp0|mp1'
pct exec 118 -- sh -c 'touch /data/.w /fast/.w && ls -l /data/.w /fast/.w && rm /data/.w /fast/.w && echo MOUNTS_OK'
```
Expected: `features: keyctl=1,nesting=1`, `mp0: /mnt/pve/hdd/hermes,mp=/data`, `mp1: /media/2tb/hermes,mp=/fast`, and `MOUNTS_OK` (proves the unprivileged container's root can write to both bind mounts).
---
## Task 8: Bootstrap script (workstation authoring)
**Files:**
- Create: `scripts/hermes-bootstrap.sh`
This script is authored and committed on the workstation, then **run inside the LXC console** in Task 9. It contains NO real secrets — only placeholders the operator edits in-container.
- [ ] **Step 1: Write `scripts/hermes-bootstrap.sh`**
```bash
#!/usr/bin/env bash
# Hermes Agent bootstrap — run INSIDE the hermes LXC (#118) console, once.
# Prereqs (already done): features nesting/keyctl set, /data and /fast bind mounts present.
set -euo pipefail
LITELLM_BASE_URL="http://10.1.10.22:4000/v1" # litellm gateway (#117)
HERMES_DATA="/opt/hermes" # ~/.hermes equivalent on rootfs (fast)
COMPOSE_DIR="/opt/hermes-stack"
echo "==> 1/5 Install rootful Docker + compose plugin"
apt-get update
apt-get install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
. /etc/os-release
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian ${VERSION_CODENAME} stable" \
> /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
systemctl enable --now docker
docker run --rm hello-world >/dev/null && echo " docker OK"
echo "==> 2/5 Prepare data + workspace dirs"
mkdir -p "${HERMES_DATA}" "${COMPOSE_DIR}"
# /data (hdd, bulk) and /fast (2tb ssd) are the bind mounts from the LXC.
mkdir -p /data/workspace /fast/workspace
echo "==> 3/5 Write docker-compose.yml"
cat > "${COMPOSE_DIR}/docker-compose.yml" <<EOF
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
shm_size: "1g" # browser tools (Playwright/Chromium)
volumes:
- ${HERMES_DATA}:/opt/data # config, memory, skills, sessions (rootfs/SSD)
- /data:/data # bulk workspace (hdd 14TB)
- /fast:/fast # fast workspace (2tb SSD)
env_file:
- ${COMPOSE_DIR}/.env
deploy:
resources:
limits:
memory: 3G
cpus: "2.0"
EOF
echo "==> 4/5 Write .env (EDIT secrets before 'gateway run')"
if [ ! -f "${COMPOSE_DIR}/.env" ]; then
cat > "${COMPOSE_DIR}/.env" <<EOF
# --- litellm gateway (OpenAI-compatible) ---
OPENAI_BASE_URL=${LITELLM_BASE_URL}
OPENAI_API_KEY=REPLACE_WITH_LITELLM_KEY
# --- messaging connectors (fill the ones you use) ---
TELEGRAM_BOT_TOKEN=
DISCORD_BOT_TOKEN=
SLACK_BOT_TOKEN=
EOF
chmod 600 "${COMPOSE_DIR}/.env"
echo " wrote ${COMPOSE_DIR}/.env — edit OPENAI_API_KEY + bot tokens now."
fi
echo "==> 5/5 First-time interactive setup (model -> litellm, sandbox=local, connectors)"
echo " Run setup, then start the gateway:"
echo " cd ${COMPOSE_DIR}"
echo " docker compose run --rm hermes setup # pick provider=custom, base_url=${LITELLM_BASE_URL}, sandbox=local"
echo " docker compose up -d # start 'gateway run'"
echo " docker compose logs -f hermes"
echo "Done. (config.yaml lives under ${HERMES_DATA}; secrets stay in ${COMPOSE_DIR}/.env)"
```
- [ ] **Step 2: Lint the script**
Run: `shellcheck scripts/hermes-bootstrap.sh` (if `shellcheck` is unavailable, run `bash -n scripts/hermes-bootstrap.sh`)
Expected: no errors (info/style notes acceptable). `bash -n` prints nothing on success.
- [ ] **Step 3: Mark executable + commit**
```bash
chmod +x scripts/hermes-bootstrap.sh
git add scripts/hermes-bootstrap.sh
git commit -m "feat: add Hermes Agent in-container bootstrap script"
```
---
## Task 9: Run bootstrap + finalize (PVE console for run, workstation for docs)
**Files:**
- Modify: `README.md`
- [ ] **Step 1: Get the script into the LXC and run it (LXC console)**
The script lives in the repo on the workstation. Get its contents into the container — easiest via the LXC's web-console shell: open an editor (`nano /root/hermes-bootstrap.sh`) and paste the file, or pipe it through the host with `pct exec 118 -- tee /root/hermes-bootstrap.sh` while pasting. Then:
```sh
pct exec 118 -- bash /root/hermes-bootstrap.sh
```
Expected: script reaches `Done.` with `docker OK`. Then, inside the container, edit `/opt/hermes-stack/.env` (litellm key + bot tokens) and run the `docker compose run --rm hermes setup` / `up -d` lines it printed.
- [ ] **Step 2: Update `README.md` structure table**
Add these rows after the `pbs-variables.tf` row:
```markdown
| `hermes.tf` | Hermes Agent LXC 컨테이너 정의 (token-safe skeleton) |
| `hermes-variables.tf` | Hermes 관련 변수 |
| `scripts/hermes-bootstrap.sh` | Hermes 인-컨테이너 설치 스크립트 |
```
- [ ] **Step 3: Append a deploy-flow section to `README.md`**
```markdown
## Hermes Agent (LXC #118)
litellm(#117, `10.1.10.22:4000`)을 LLM 게이트웨이로 쓰는 Nous Research Hermes Agent.
배포는 4단계 (bind mount·features는 API 토큰 불가 → 콘솔 `pct set`):
1. 호스트 준비(node1 콘솔): `mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes && chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes`
2. `terraform apply` (컨테이너 생성)
3. node1 콘솔: `pct set 118 -features nesting=1,keyctl=1 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast && pct reboot 118`
4. LXC 콘솔: `scripts/hermes-bootstrap.sh` 실행 → `/opt/hermes-stack/.env` 채우고 `docker compose run --rm hermes setup``docker compose up -d`
> 비밀값(litellm 키·봇 토큰)은 컨테이너의 `/opt/hermes-stack/.env`에만 두고 repo에 커밋하지 않는다.
> TODO: hermes `mp0/mp1`는 TF state에 없음 → 추후 `terraform import`로 따라잡기.
```
- [ ] **Step 4: Commit docs**
```bash
git add README.md
git commit -m "docs: document Hermes Agent deploy flow"
```
---
## Task 10: End-to-end verification
**Files:** none
- [ ] **Step 1: Container + Docker health (node1 console)**
```sh
pct exec 118 -- docker ps --format '{{.Names}} {{.Status}}'
```
Expected: `hermes Up ...` (healthy/running).
- [ ] **Step 2: LLM path through litellm (LXC console)**
```sh
pct exec 118 -- curl -s http://10.1.10.22:4000/v1/models -H "Authorization: Bearer $(grep OPENAI_API_KEY /opt/hermes-stack/.env | cut -d= -f2)" | head -c 400
```
Expected: a JSON model list from litellm (proves hermes's network path + key reach the gateway). Note the model id(s) — set Hermes `model.default` to one of these during `setup`.
- [ ] **Step 3: Workspace persistence on the big disk (node1 console)**
```sh
pct exec 118 -- sh -c 'echo hi > /data/workspace/_probe.txt'
cat /mnt/pve/hdd/hermes/workspace/_probe.txt && rm /mnt/pve/hdd/hermes/workspace/_probe.txt
```
Expected: `hi` printed from the **host** path — proves the agent's `/data` writes land on `/mnt/pve/hdd/hermes` (14TB disk).
- [ ] **Step 4: Messaging connector end-to-end (manual)**
Send a test message from the configured platform (e.g. Telegram) to the bot; confirm Hermes replies. Check `docker compose logs -f hermes` for the round-trip.
- [ ] **Step 5: Final commit (if any uncommitted state/docs)**
```bash
git add -A && git commit -m "chore: Hermes Agent LXC deploy verified" || echo "nothing to commit"
```
---
## Notes / Follow-ups
- **TF import:** add the `mp0/mp1` bind mounts to TF state later via `terraform import` once a root@pam/SSH path is available (same outstanding task as 115/700 in `nfs-lxc-sharing-redesign`).
- **Sandbox:** start `local`; revisit Docker sandbox backend (DinD) only if subagent isolation is needed.
- **Memory:** after deploy, record the hermes LXC + the API-token-can't-bind-mount/features constraint in project memory.

View File

@@ -48,7 +48,7 @@ and generated files on the host's large disks via direct bind mounts.
| Decision | Choice | Rationale | | Decision | Choice | Rationale |
|---|---|---| |---|---|---|
| Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. | | Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. |
| Provisioning | **Terraform** (this repo) | Infra-as-code; mirrors `pbs.tf` pattern. In-container install is a scripted console step. | | Provisioning | **Terraform (container only) + console for bind mounts** | TF mirrors `pbs.tf` for the container. **Bind mounts cannot be created via API token** (Proxmox restricts them to `root@pam`/SSH), so `mp0/mp1` are added via console `pct set` — same method already used for jellyfin(115)/tos-api(700). `terraform import` of the mounts is a follow-up. |
| Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** | | Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** |
| Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. | | Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. |
| Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. | | Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. |
@@ -79,7 +79,7 @@ and generated files on the host's large disks via direct bind mounts.
| VMID | `118` (adjacent to litellm `117`, AI group) | | VMID | `118` (adjacent to litellm `117`, AI group) |
| Node | `gihyeon` | | Node | `gihyeon` |
| Type | unprivileged LXC, Debian 12 | | Type | unprivileged LXC, Debian 12 |
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) | | Features | `nesting = 1`, `keyctl = 1` (required for Docker)**set via console `pct set`**, not TF (API token can't set host-security features) |
| CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) | | CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) |
| rootfs | 24 GB on `local-lvm` | | rootfs | 24 GB on `local-lvm` |
| Network | `eth0` on bridge `intra01`, IPv4 DHCP | | Network | `eth0` on bridge `intra01`, IPv4 DHCP |
@@ -92,10 +92,13 @@ and generated files on the host's large disks via direct bind mounts.
| `mp0` | `/mnt/pve/hdd/hermes` | `/data` | 14TB bulk: code, artifacts, downloads | | `mp0` | `/mnt/pve/hdd/hermes` | `/data` | 14TB bulk: code, artifacts, downloads |
| `mp1` | `/media/2tb/hermes` | `/fast` | SSD: fast workspace / builds | | `mp1` | `/media/2tb/hermes` | `/fast` | SSD: fast workspace / builds |
bpg `mount_point` blocks use an absolute host path as `volume` to create a bind **Bind mounts are NOT in Terraform.** The Proxmox API token cannot create bind
mount. Both container paths are passed into the Hermes Docker container as mounts (root@pam/SSH only), so `mp0/mp1` are added in the console with
volumes so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`, `pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast`.
Both container paths are then passed into the Hermes Docker container as volumes
so the agent's outputs land on the large disks. `~/.hermes` (`/opt/data`,
small/fast config + memory + sqlite) stays on rootfs (SSD), **not** on the bulk disk. small/fast config + memory + sqlite) stays on rootfs (SSD), **not** on the bulk disk.
A `terraform import` of these mount points is tracked as a follow-up (same as 115/700).
### Unprivileged UID mapping (critical) ### Unprivileged UID mapping (critical)
Unlike jellyfin(115)/tos-api(700) — which are *privileged* (root→root, no perms Unlike jellyfin(115)/tos-api(700) — which are *privileged* (root→root, no perms
@@ -124,23 +127,32 @@ subtree is remapped** (isolation preserved), not the whole disk.
- Messaging extras (Telegram/Discord/Slack) enabled in the gateway image. - Messaging extras (Telegram/Discord/Slack) enabled in the gateway image.
## 8. Provisioning sequence (order matters) ## 8. Provisioning sequence (order matters)
1. **Host prep** (node1 web console, once): bind-mount targets must exist before `apply`. 1. **Host prep** (node1 web console, once): create + chown bind-mount targets.
```sh ```sh
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
``` ```
2. **Terraform apply** (from workstation): creates LXC #118 + bind mounts. 2. **Terraform apply** (from workstation): creates LXC #118 token-safe skeleton
3. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` — (rootfs, network, cpu/mem, unprivileged, onboot). **No features, no bind mounts**
install Docker + compose plugin → write `docker-compose.yml` + `config.yaml` (API-token can't set host-security settings).
pointing at litellm → fill `.env` (litellm key, bot tokens) → `hermes setup` 3. **Apply features + bind mounts** (node1 console, once): use `pct set`:
→ `gateway run`. ```sh
pct set 118 -features nesting=1,keyctl=1 \
-mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct reboot 118
```
4. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —
install Docker (rootful) + compose plugin → write `docker-compose.yml` +
`config.yaml` pointing at litellm → fill `.env` (litellm key, bot tokens) →
`hermes setup` → `gateway run`.
> In-container / host shell work is performed by the user via the **PVE web > In-container / host shell work is performed by the user via the **PVE web
> console** (per `proxmox-access` memory — host SSH intentionally unused). > console** (per `proxmox-access` memory — host SSH intentionally unused).
## 9. Repo changes ## 9. Repo changes
- **New:** `hermes.tf` (download template + container resource + bind mounts), - **New:** `hermes.tf` (container resource — **no bind mounts**),
`hermes-variables.tf`, `scripts/hermes-bootstrap.sh`. `hermes-variables.tf`, `scripts/hermes-bootstrap.sh` (host prep + `pct set` mounts + Docker/hermes install).
- **Modified:** `terraform.tfvars` + `terraform.tfvars.example` (hermes vars), - **Modified:** `terraform.tfvars` + `terraform.tfvars.example` (hermes vars),
`outputs.tf` (VMID / IP), `README.md` (install steps), `gitignore` (ensure `.env` / secrets excluded). `outputs.tf` (VMID / IP), `README.md` (install steps), `gitignore` (ensure `.env` / secrets excluded).
@@ -152,7 +164,8 @@ subtree is remapped** (isolation preserved), not the whole disk.
- Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start `local`. - Docker sandbox backend (DinD) for stronger subagent isolation — deferred; start `local`.
- Static IP instead of DHCP — deferred (DHCP matches litellm). - Static IP instead of DHCP — deferred (DHCP matches litellm).
- Dashboard/gateway-API exposure with auth — only if a non-messaging use appears. - Dashboard/gateway-API exposure with auth — only if a non-messaging use appears.
- `terraform import` of existing 115/700 mount points — tracked separately in `nfs-lxc-sharing-redesign`. - `terraform import` of the hermes `mp0/mp1` bind mounts into TF state — follow-up (same pattern as 115/700 in `nfs-lxc-sharing-redesign`).
- Use **rootful** Docker in the LXC (not rootless): Hermes' gateway↔dashboard talk over localhost in one container, so a single netns is required. The ZFS overlay2→vfs caveat from public writeups does not apply here (storage is LVM-thin/ext4/dir, not ZFS).
## 12. Rollback ## 12. Rollback
- `terraform destroy -target` the hermes container, or `pct destroy 118`. - `terraform destroy -target` the hermes container, or `pct destroy 118`.