Correct README/plan/spec after the apply-failure root cause: nesting/keyctl are settable by the API token on an unprivileged CT and are required at create to avoid the systemd-252 TASK WARNINGS that fails apply. Console step reduced to bind mounts only. README apply uses -target (PBS disk drift). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
19 KiB
Hermes Agent LXC Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Deploy Nous Research Hermes Agent as an unprivileged Docker LXC (#118) on node1 (gihyeon), using the existing litellm LXC (10.1.10.22:4000) as its OpenAI-compatible LLM gateway, with large-disk bind mounts for the agent workspace.
Architecture: Terraform creates the LXC including features { nesting/keyctl } (the token CAN set these on an unprivileged CT, and nesting at create time avoids the systemd-252 "enable nesting" warning that otherwise fails the apply). The only host setting the API token cannot do is bind mounts (host paths require root@pam), so mp0/mp1 are added once via the PVE web console with pct set. A bootstrap script then installs rootful Docker and runs the official nousresearch/hermes-agent image via compose, pointed at litellm, with sandbox=local and messaging connectors.
Tech Stack: Terraform (bpg/proxmox provider), Proxmox VE 9.1 LXC, Docker + docker-compose, Hermes Agent (Nous Research).
Spec: docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md
Execution split:
- Workstation (Terraform): Tasks 1–5, 8, 9 — run
terraformagainst the API token. - PVE web console (user runs, pastes output back): Tasks 6, 7 — host/in-container ops (per
proxmox-access: host SSH is intentionally unused).
File Structure
| File | Responsibility |
|---|---|
hermes-variables.tf (new) |
All Hermes LXC input variables with defaults |
hermes.tf (new) |
Debian12 template download (gihyeon) + token-safe container resource (no features, no mounts) |
terraform.tfvars (modify) |
Set Hermes values for this homelab |
terraform.tfvars.example (modify) |
Document Hermes values for other users |
outputs.tf (modify) |
Expose Hermes VMID + hostname |
scripts/hermes-bootstrap.sh (new) |
Host prep + pct set (features+mounts) + Docker install + compose + Hermes config (placeholders for secrets) |
README.md (modify) |
Document the 4-phase deploy flow |
Task 1: Hermes input variables
Files:
-
Create:
hermes-variables.tf -
Step 1: Write
hermes-variables.tf
variable "hermes_vmid" {
description = "VMID for the Hermes Agent LXC"
type = number
default = 118
}
variable "hermes_hostname" {
description = "Hostname for the Hermes Agent LXC"
type = string
default = "hermes"
}
variable "hermes_node" {
description = "Proxmox node to host the Hermes Agent LXC"
type = string
default = "gihyeon"
}
variable "hermes_cores" {
description = "CPU cores for the Hermes Agent LXC"
type = number
default = 2
}
variable "hermes_memory" {
description = "Dedicated memory (MB) for the Hermes Agent LXC"
type = number
default = 4096
}
variable "hermes_swap" {
description = "Swap (MB) for the Hermes Agent LXC"
type = number
default = 512
}
variable "hermes_disk_size" {
description = "Root filesystem size (GB) for the Hermes Agent LXC"
type = number
default = 24
}
variable "hermes_datastore" {
description = "Datastore for the Hermes Agent LXC root filesystem"
type = string
default = "local-lvm"
}
variable "hermes_network_bridge" {
description = "Network bridge (SDN VNET) for the Hermes Agent LXC"
type = string
default = "intra01"
}
- Step 2: Format + validate
Run: terraform fmt hermes-variables.tf && terraform validate
Expected: Success! The configuration is valid. (validate may warn about the missing hermes.tf resource until Task 2 — that is fine; the goal here is no HCL syntax error in this file.)
- Step 3: Commit
git add hermes-variables.tf
git commit -m "feat: add Hermes Agent LXC variables"
Task 2: Hermes container resource (token-safe skeleton)
Files:
- Create:
hermes.tf
Reuses the existing var.dns_servers (defined in pbs-variables.tf).
- Step 1: Write
hermes.tf
# Download Debian 12 LXC template to gihyeon (node1).
resource "proxmox_virtual_environment_download_file" "debian12_template_gihyeon" {
content_type = "vztmpl"
datastore_id = "local"
node_name = var.hermes_node
url = "http://download.proxmox.com/images/system/debian-12-standard_12.12-1_amd64.tar.zst"
}
# Hermes Agent LXC.
# `features` (nesting/keyctl) ARE set here: on an unprivileged container these need
# only VM.Allocate, which the API token has, so Terraform can set them. nesting is
# also required so the systemd-252 (Debian 12) create does not emit the "enable
# nesting" warning that Proxmox returns as TASK WARNINGS (which fails the apply).
# Bind mounts (mp0/mp1, host paths) genuinely DO require root@pam, so those are still
# added via the PVE web console with `pct set` (see scripts/hermes-bootstrap.sh and
# docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md).
resource "proxmox_virtual_environment_container" "hermes" {
description = "Hermes Agent (Nous Research) - Managed by Terraform"
node_name = var.hermes_node
vm_id = var.hermes_vmid
start_on_boot = true
unprivileged = true
tags = ["ai", "agent", "terraform"]
features {
nesting = true
keyctl = true
}
operating_system {
template_file_id = proxmox_virtual_environment_download_file.debian12_template_gihyeon.id
type = "debian"
}
cpu {
cores = var.hermes_cores
}
memory {
dedicated = var.hermes_memory
swap = var.hermes_swap
}
disk {
datastore_id = var.hermes_datastore
size = var.hermes_disk_size
}
network_interface {
name = "eth0"
bridge = var.hermes_network_bridge
}
initialization {
hostname = var.hermes_hostname
ip_config {
ipv4 {
address = "dhcp"
}
}
dns {
servers = var.dns_servers
}
}
}
- Step 2: Format + validate
Run: terraform fmt hermes.tf && terraform validate
Expected: Success! The configuration is valid.
- Step 3: Commit
git add hermes.tf
git commit -m "feat: add Hermes Agent LXC container resource"
Task 3: tfvars values
Files:
- Modify:
terraform.tfvars - Modify:
terraform.tfvars.example
Defaults in hermes-variables.tf already match this homelab, so tfvars only needs an explicit override block for clarity/discoverability.
- Step 1: Append to
terraform.tfvars
Add after the existing DNS line:
# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid = 118
hermes_node = "gihyeon"
hermes_network_bridge = "intra01"
- Step 2: Append the same block to
terraform.tfvars.example
# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid = 118
hermes_node = "gihyeon"
hermes_network_bridge = "intra01"
- Step 3: Validate
Run: terraform fmt && terraform validate
Expected: Success! The configuration is valid.
- Step 4: Commit
git add terraform.tfvars terraform.tfvars.example
git commit -m "feat: set Hermes Agent LXC tfvars"
Task 4: Outputs
Files:
-
Modify:
outputs.tf -
Step 1: Append to
outputs.tf
output "hermes_container_id" {
description = "Hermes Agent LXC container ID"
value = proxmox_virtual_environment_container.hermes.vm_id
}
output "hermes_hostname" {
description = "Hermes Agent LXC hostname (IP is DHCP-assigned; discover via PVE/API)"
value = var.hermes_hostname
}
- Step 2: Validate
Run: terraform validate
Expected: Success! The configuration is valid.
- Step 3: Commit
git add outputs.tf
git commit -m "feat: add Hermes Agent LXC outputs"
Task 5: Plan + apply the container (workstation)
Files: none (infra apply)
- Step 1: Review the plan
Run: terraform plan
Expected: 2 to add — proxmox_virtual_environment_download_file.debian12_template_gihyeon and proxmox_virtual_environment_container.hermes.
⚠️ Known pre-existing drift: the plan ALSO shows
1 to change—proxmox_virtual_environment_container.pbsdisksize = 48 -> 16. The live PBS rootfs is 48GB butpbs.tfdeclares 16GB. A blanket apply would try to shrink the PBS disk (dangerous). Do NOT untargeted-apply. Reconcile separately by settingpbs.tfsize = 48to match reality (no infra change), or leave it and always target hermes.
- Step 2: Apply (TARGETED to hermes only)
Run:
terraform apply \
-target=proxmox_virtual_environment_download_file.debian12_template_gihyeon \
-target=proxmox_virtual_environment_container.hermes
Expected: Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs include hermes_container_id = 118. The -target flags ensure the PBS disk drift is NOT touched.
If apply errors with a permission/
root@pam-only message on any container attribute, STOP — it means an attribute inhermes.tfis host-restricted. The skeleton here is intentionally limited to attributes the PBS container already created successfully via the same token, so this is not expected.
- Step 3: Confirm via API (read-only)
Run:
curl -sk -H "Authorization: PVEAPIToken=root@pam!terrform=1408ded5-c7c4-4384-8b19-64178837fb8c" \
"https://192.168.50.87:8006/api2/json/nodes/gihyeon/lxc/118/status/current" \
| python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(d['name'], d['status'])"
Expected: hermes running (or stopped — the container may not auto-start before features/mounts; Task 7 reboots it).
- Step 4: Commit state
git add terraform.tfstate terraform.tfstate.backup
git commit -m "chore: apply Hermes Agent LXC (state)"
Task 6: Host prep — create + chown bind-mount targets (PVE console)
Run in the node1 (gihyeon) shell via PVE web console. Paste output back.
- Step 1: Create the workspace dirs and chown to the unprivileged-mapped root
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
ls -lnd /mnt/pve/hdd/hermes /media/2tb/hermes
Expected: both dirs exist and ls -lnd shows owner/group 100000 100000.
Task 7: Add bind mounts, reboot (PVE console)
Run in the node1 (gihyeon) shell via PVE web console. Paste output back.
NOTE:
features(nesting/keyctl) are already set by Terraform (Task 2) — the API token CAN set them on an unprivileged CT, andnestingat create time is required to avoid the "enable nesting" warning that fails the apply. Only bind mounts need the console (host-path mounts require root@pam).
- Step 1: Add the two bind mounts
pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct reboot 118
Expected: no error output from pct set; container reboots.
- Step 2: Verify config + writable mounts
pct config 118 | grep -E 'features|mp0|mp1'
pct exec 118 -- sh -c 'touch /data/.w /fast/.w && ls -l /data/.w /fast/.w && rm /data/.w /fast/.w && echo MOUNTS_OK'
Expected: features: keyctl=1,nesting=1 (set by TF), mp0: /mnt/pve/hdd/hermes,mp=/data, mp1: /media/2tb/hermes,mp=/fast, and MOUNTS_OK (proves the unprivileged container's root can write to both bind mounts).
Task 8: Bootstrap script (workstation authoring)
Files:
- Create:
scripts/hermes-bootstrap.sh
This script is authored and committed on the workstation, then run inside the LXC console in Task 9. It contains NO real secrets — only placeholders the operator edits in-container.
- Step 1: Write
scripts/hermes-bootstrap.sh
#!/usr/bin/env bash
# Hermes Agent bootstrap — run INSIDE the hermes LXC (#118) console, once.
# Prereqs (already done): features nesting/keyctl set, /data and /fast bind mounts present.
set -euo pipefail
LITELLM_BASE_URL="http://10.1.10.22:4000/v1" # litellm gateway (#117)
HERMES_DATA="/opt/hermes" # ~/.hermes equivalent on rootfs (fast)
COMPOSE_DIR="/opt/hermes-stack"
echo "==> 1/5 Install rootful Docker + compose plugin"
apt-get update
apt-get install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
. /etc/os-release
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian ${VERSION_CODENAME} stable" \
> /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
systemctl enable --now docker
docker run --rm hello-world >/dev/null && echo " docker OK"
echo "==> 2/5 Prepare data + workspace dirs"
mkdir -p "${HERMES_DATA}" "${COMPOSE_DIR}"
# /data (hdd, bulk) and /fast (2tb ssd) are the bind mounts from the LXC.
mkdir -p /data/workspace /fast/workspace
echo "==> 3/5 Write docker-compose.yml"
cat > "${COMPOSE_DIR}/docker-compose.yml" <<EOF
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
shm_size: "1g" # browser tools (Playwright/Chromium)
volumes:
- ${HERMES_DATA}:/opt/data # config, memory, skills, sessions (rootfs/SSD)
- /data:/data # bulk workspace (hdd 14TB)
- /fast:/fast # fast workspace (2tb SSD)
env_file:
- ${COMPOSE_DIR}/.env
deploy:
resources:
limits:
memory: 3G
cpus: "2.0"
EOF
echo "==> 4/5 Write .env (EDIT secrets before 'gateway run')"
if [ ! -f "${COMPOSE_DIR}/.env" ]; then
cat > "${COMPOSE_DIR}/.env" <<EOF
# --- litellm gateway (OpenAI-compatible) ---
OPENAI_BASE_URL=${LITELLM_BASE_URL}
OPENAI_API_KEY=REPLACE_WITH_LITELLM_KEY
# --- messaging connectors (fill the ones you use) ---
TELEGRAM_BOT_TOKEN=
DISCORD_BOT_TOKEN=
SLACK_BOT_TOKEN=
EOF
chmod 600 "${COMPOSE_DIR}/.env"
echo " wrote ${COMPOSE_DIR}/.env — edit OPENAI_API_KEY + bot tokens now."
fi
echo "==> 5/5 First-time interactive setup (model -> litellm, sandbox=local, connectors)"
echo " Run setup, then start the gateway:"
echo " cd ${COMPOSE_DIR}"
echo " docker compose run --rm hermes setup # pick provider=custom, base_url=${LITELLM_BASE_URL}, sandbox=local"
echo " docker compose up -d # start 'gateway run'"
echo " docker compose logs -f hermes"
echo "Done. (config.yaml lives under ${HERMES_DATA}; secrets stay in ${COMPOSE_DIR}/.env)"
- Step 2: Lint the script
Run: shellcheck scripts/hermes-bootstrap.sh (if shellcheck is unavailable, run bash -n scripts/hermes-bootstrap.sh)
Expected: no errors (info/style notes acceptable). bash -n prints nothing on success.
- Step 3: Mark executable + commit
chmod +x scripts/hermes-bootstrap.sh
git add scripts/hermes-bootstrap.sh
git commit -m "feat: add Hermes Agent in-container bootstrap script"
Task 9: Run bootstrap + finalize (PVE console for run, workstation for docs)
Files:
-
Modify:
README.md -
Step 1: Get the script into the LXC and run it (LXC console)
The script lives in the repo on the workstation. Get its contents into the container — easiest via the LXC's web-console shell: open an editor (nano /root/hermes-bootstrap.sh) and paste the file, or pipe it through the host with pct exec 118 -- tee /root/hermes-bootstrap.sh while pasting. Then:
pct exec 118 -- bash /root/hermes-bootstrap.sh
Expected: script reaches Done. with docker OK. Then, inside the container, edit /opt/hermes-stack/.env (litellm key + bot tokens) and run the docker compose run --rm hermes setup / up -d lines it printed.
- Step 2: Update
README.mdstructure table
Add these rows after the pbs-variables.tf row:
| `hermes.tf` | Hermes Agent LXC 컨테이너 정의 (token-safe skeleton) |
| `hermes-variables.tf` | Hermes 관련 변수 |
| `scripts/hermes-bootstrap.sh` | Hermes 인-컨테이너 설치 스크립트 |
- Step 3: Append a deploy-flow section to
README.md
## Hermes Agent (LXC #118)
litellm(#117, `10.1.10.22:4000`)을 LLM 게이트웨이로 쓰는 Nous Research Hermes Agent.
배포는 4단계 (bind mount·features는 API 토큰 불가 → 콘솔 `pct set`):
1. 호스트 준비(node1 콘솔): `mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes && chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes`
2. `terraform apply` (컨테이너 생성)
3. node1 콘솔: `pct set 118 -features nesting=1,keyctl=1 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast && pct reboot 118`
4. LXC 콘솔: `scripts/hermes-bootstrap.sh` 실행 → `/opt/hermes-stack/.env` 채우고 `docker compose run --rm hermes setup` → `docker compose up -d`
> 비밀값(litellm 키·봇 토큰)은 컨테이너의 `/opt/hermes-stack/.env`에만 두고 repo에 커밋하지 않는다.
> TODO: hermes `mp0/mp1`는 TF state에 없음 → 추후 `terraform import`로 따라잡기.
- Step 4: Commit docs
git add README.md
git commit -m "docs: document Hermes Agent deploy flow"
Task 10: End-to-end verification
Files: none
- Step 1: Container + Docker health (node1 console)
pct exec 118 -- docker ps --format '{{.Names}} {{.Status}}'
Expected: hermes Up ... (healthy/running).
- Step 2: LLM path through litellm (LXC console)
pct exec 118 -- curl -s http://10.1.10.22:4000/v1/models -H "Authorization: Bearer $(grep OPENAI_API_KEY /opt/hermes-stack/.env | cut -d= -f2)" | head -c 400
Expected: a JSON model list from litellm (proves hermes's network path + key reach the gateway). Note the model id(s) — set Hermes model.default to one of these during setup.
- Step 3: Workspace persistence on the big disk (node1 console)
pct exec 118 -- sh -c 'echo hi > /data/workspace/_probe.txt'
cat /mnt/pve/hdd/hermes/workspace/_probe.txt && rm /mnt/pve/hdd/hermes/workspace/_probe.txt
Expected: hi printed from the host path — proves the agent's /data writes land on /mnt/pve/hdd/hermes (14TB disk).
- Step 4: Messaging connector end-to-end (manual)
Send a test message from the configured platform (e.g. Telegram) to the bot; confirm Hermes replies. Check docker compose logs -f hermes for the round-trip.
- Step 5: Final commit (if any uncommitted state/docs)
git add -A && git commit -m "chore: Hermes Agent LXC deploy verified" || echo "nothing to commit"
Notes / Follow-ups
- TF import: add the
mp0/mp1bind mounts to TF state later viaterraform importonce a root@pam/SSH path is available (same outstanding task as 115/700 innfs-lxc-sharing-redesign). - Sandbox: start
local; revisit Docker sandbox backend (DinD) only if subagent isolation is needed. - Memory: after deploy, record the hermes LXC + the API-token-can't-bind-mount/features constraint in project memory.