Files
proxmox-iac/docs/superpowers/plans/2026-06-18-hermes-agent-lxc.md
21in7 f6dc709793 docs: features set in Terraform (token can); only bind mounts via console
Correct README/plan/spec after the apply-failure root cause: nesting/keyctl
are settable by the API token on an unprivileged CT and are required at create
to avoid the systemd-252 TASK WARNINGS that fails apply. Console step reduced
to bind mounts only. README apply uses -target (PBS disk drift).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 00:18:23 +09:00

19 KiB
Raw Permalink Blame History

Hermes Agent LXC Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Deploy Nous Research Hermes Agent as an unprivileged Docker LXC (#118) on node1 (gihyeon), using the existing litellm LXC (10.1.10.22:4000) as its OpenAI-compatible LLM gateway, with large-disk bind mounts for the agent workspace.

Architecture: Terraform creates the LXC including features { nesting/keyctl } (the token CAN set these on an unprivileged CT, and nesting at create time avoids the systemd-252 "enable nesting" warning that otherwise fails the apply). The only host setting the API token cannot do is bind mounts (host paths require root@pam), so mp0/mp1 are added once via the PVE web console with pct set. A bootstrap script then installs rootful Docker and runs the official nousresearch/hermes-agent image via compose, pointed at litellm, with sandbox=local and messaging connectors.

Tech Stack: Terraform (bpg/proxmox provider), Proxmox VE 9.1 LXC, Docker + docker-compose, Hermes Agent (Nous Research).

Spec: docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md

Execution split:

  • Workstation (Terraform): Tasks 15, 8, 9 — run terraform against the API token.
  • PVE web console (user runs, pastes output back): Tasks 6, 7 — host/in-container ops (per proxmox-access: host SSH is intentionally unused).

File Structure

File Responsibility
hermes-variables.tf (new) All Hermes LXC input variables with defaults
hermes.tf (new) Debian12 template download (gihyeon) + token-safe container resource (no features, no mounts)
terraform.tfvars (modify) Set Hermes values for this homelab
terraform.tfvars.example (modify) Document Hermes values for other users
outputs.tf (modify) Expose Hermes VMID + hostname
scripts/hermes-bootstrap.sh (new) Host prep + pct set (features+mounts) + Docker install + compose + Hermes config (placeholders for secrets)
README.md (modify) Document the 4-phase deploy flow

Task 1: Hermes input variables

Files:

  • Create: hermes-variables.tf

  • Step 1: Write hermes-variables.tf

variable "hermes_vmid" {
  description = "VMID for the Hermes Agent LXC"
  type        = number
  default     = 118
}

variable "hermes_hostname" {
  description = "Hostname for the Hermes Agent LXC"
  type        = string
  default     = "hermes"
}

variable "hermes_node" {
  description = "Proxmox node to host the Hermes Agent LXC"
  type        = string
  default     = "gihyeon"
}

variable "hermes_cores" {
  description = "CPU cores for the Hermes Agent LXC"
  type        = number
  default     = 2
}

variable "hermes_memory" {
  description = "Dedicated memory (MB) for the Hermes Agent LXC"
  type        = number
  default     = 4096
}

variable "hermes_swap" {
  description = "Swap (MB) for the Hermes Agent LXC"
  type        = number
  default     = 512
}

variable "hermes_disk_size" {
  description = "Root filesystem size (GB) for the Hermes Agent LXC"
  type        = number
  default     = 24
}

variable "hermes_datastore" {
  description = "Datastore for the Hermes Agent LXC root filesystem"
  type        = string
  default     = "local-lvm"
}

variable "hermes_network_bridge" {
  description = "Network bridge (SDN VNET) for the Hermes Agent LXC"
  type        = string
  default     = "intra01"
}
  • Step 2: Format + validate

Run: terraform fmt hermes-variables.tf && terraform validate Expected: Success! The configuration is valid. (validate may warn about the missing hermes.tf resource until Task 2 — that is fine; the goal here is no HCL syntax error in this file.)

  • Step 3: Commit
git add hermes-variables.tf
git commit -m "feat: add Hermes Agent LXC variables"

Task 2: Hermes container resource (token-safe skeleton)

Files:

  • Create: hermes.tf

Reuses the existing var.dns_servers (defined in pbs-variables.tf).

  • Step 1: Write hermes.tf
# Download Debian 12 LXC template to gihyeon (node1).
resource "proxmox_virtual_environment_download_file" "debian12_template_gihyeon" {
  content_type = "vztmpl"
  datastore_id = "local"
  node_name    = var.hermes_node
  url          = "http://download.proxmox.com/images/system/debian-12-standard_12.12-1_amd64.tar.zst"
}

# Hermes Agent LXC.
# `features` (nesting/keyctl) ARE set here: on an unprivileged container these need
# only VM.Allocate, which the API token has, so Terraform can set them. nesting is
# also required so the systemd-252 (Debian 12) create does not emit the "enable
# nesting" warning that Proxmox returns as TASK WARNINGS (which fails the apply).
# Bind mounts (mp0/mp1, host paths) genuinely DO require root@pam, so those are still
# added via the PVE web console with `pct set` (see scripts/hermes-bootstrap.sh and
# docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md).
resource "proxmox_virtual_environment_container" "hermes" {
  description   = "Hermes Agent (Nous Research) - Managed by Terraform"
  node_name     = var.hermes_node
  vm_id         = var.hermes_vmid
  start_on_boot = true
  unprivileged  = true
  tags          = ["ai", "agent", "terraform"]

  features {
    nesting = true
    keyctl  = true
  }

  operating_system {
    template_file_id = proxmox_virtual_environment_download_file.debian12_template_gihyeon.id
    type             = "debian"
  }

  cpu {
    cores = var.hermes_cores
  }

  memory {
    dedicated = var.hermes_memory
    swap      = var.hermes_swap
  }

  disk {
    datastore_id = var.hermes_datastore
    size         = var.hermes_disk_size
  }

  network_interface {
    name   = "eth0"
    bridge = var.hermes_network_bridge
  }

  initialization {
    hostname = var.hermes_hostname

    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    dns {
      servers = var.dns_servers
    }
  }
}
  • Step 2: Format + validate

Run: terraform fmt hermes.tf && terraform validate Expected: Success! The configuration is valid.

  • Step 3: Commit
git add hermes.tf
git commit -m "feat: add Hermes Agent LXC container resource"

Task 3: tfvars values

Files:

  • Modify: terraform.tfvars
  • Modify: terraform.tfvars.example

Defaults in hermes-variables.tf already match this homelab, so tfvars only needs an explicit override block for clarity/discoverability.

  • Step 1: Append to terraform.tfvars

Add after the existing DNS line:


# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid           = 118
hermes_node           = "gihyeon"
hermes_network_bridge = "intra01"
  • Step 2: Append the same block to terraform.tfvars.example

# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid           = 118
hermes_node           = "gihyeon"
hermes_network_bridge = "intra01"
  • Step 3: Validate

Run: terraform fmt && terraform validate Expected: Success! The configuration is valid.

  • Step 4: Commit
git add terraform.tfvars terraform.tfvars.example
git commit -m "feat: set Hermes Agent LXC tfvars"

Task 4: Outputs

Files:

  • Modify: outputs.tf

  • Step 1: Append to outputs.tf


output "hermes_container_id" {
  description = "Hermes Agent LXC container ID"
  value       = proxmox_virtual_environment_container.hermes.vm_id
}

output "hermes_hostname" {
  description = "Hermes Agent LXC hostname (IP is DHCP-assigned; discover via PVE/API)"
  value       = var.hermes_hostname
}
  • Step 2: Validate

Run: terraform validate Expected: Success! The configuration is valid.

  • Step 3: Commit
git add outputs.tf
git commit -m "feat: add Hermes Agent LXC outputs"

Task 5: Plan + apply the container (workstation)

Files: none (infra apply)

  • Step 1: Review the plan

Run: terraform plan Expected: 2 to addproxmox_virtual_environment_download_file.debian12_template_gihyeon and proxmox_virtual_environment_container.hermes.

⚠️ Known pre-existing drift: the plan ALSO shows 1 to changeproxmox_virtual_environment_container.pbs disk size = 48 -> 16. The live PBS rootfs is 48GB but pbs.tf declares 16GB. A blanket apply would try to shrink the PBS disk (dangerous). Do NOT untargeted-apply. Reconcile separately by setting pbs.tf size = 48 to match reality (no infra change), or leave it and always target hermes.

  • Step 2: Apply (TARGETED to hermes only)

Run:

terraform apply \
  -target=proxmox_virtual_environment_download_file.debian12_template_gihyeon \
  -target=proxmox_virtual_environment_container.hermes

Expected: Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs include hermes_container_id = 118. The -target flags ensure the PBS disk drift is NOT touched.

If apply errors with a permission/root@pam-only message on any container attribute, STOP — it means an attribute in hermes.tf is host-restricted. The skeleton here is intentionally limited to attributes the PBS container already created successfully via the same token, so this is not expected.

  • Step 3: Confirm via API (read-only)

Run:

curl -sk -H "Authorization: PVEAPIToken=root@pam!terrform=1408ded5-c7c4-4384-8b19-64178837fb8c" \
  "https://192.168.50.87:8006/api2/json/nodes/gihyeon/lxc/118/status/current" \
  | python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(d['name'], d['status'])"

Expected: hermes running (or stopped — the container may not auto-start before features/mounts; Task 7 reboots it).

  • Step 4: Commit state
git add terraform.tfstate terraform.tfstate.backup
git commit -m "chore: apply Hermes Agent LXC (state)"

Task 6: Host prep — create + chown bind-mount targets (PVE console)

Run in the node1 (gihyeon) shell via PVE web console. Paste output back.

  • Step 1: Create the workspace dirs and chown to the unprivileged-mapped root
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
ls -lnd /mnt/pve/hdd/hermes /media/2tb/hermes

Expected: both dirs exist and ls -lnd shows owner/group 100000 100000.


Task 7: Add bind mounts, reboot (PVE console)

Run in the node1 (gihyeon) shell via PVE web console. Paste output back.

NOTE: features (nesting/keyctl) are already set by Terraform (Task 2) — the API token CAN set them on an unprivileged CT, and nesting at create time is required to avoid the "enable nesting" warning that fails the apply. Only bind mounts need the console (host-path mounts require root@pam).

  • Step 1: Add the two bind mounts
pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data \
            -mp1 /media/2tb/hermes,mp=/fast
pct reboot 118

Expected: no error output from pct set; container reboots.

  • Step 2: Verify config + writable mounts
pct config 118 | grep -E 'features|mp0|mp1'
pct exec 118 -- sh -c 'touch /data/.w /fast/.w && ls -l /data/.w /fast/.w && rm /data/.w /fast/.w && echo MOUNTS_OK'

Expected: features: keyctl=1,nesting=1 (set by TF), mp0: /mnt/pve/hdd/hermes,mp=/data, mp1: /media/2tb/hermes,mp=/fast, and MOUNTS_OK (proves the unprivileged container's root can write to both bind mounts).


Task 8: Bootstrap script (workstation authoring)

Files:

  • Create: scripts/hermes-bootstrap.sh

This script is authored and committed on the workstation, then run inside the LXC console in Task 9. It contains NO real secrets — only placeholders the operator edits in-container.

  • Step 1: Write scripts/hermes-bootstrap.sh
#!/usr/bin/env bash
# Hermes Agent bootstrap — run INSIDE the hermes LXC (#118) console, once.
# Prereqs (already done): features nesting/keyctl set, /data and /fast bind mounts present.
set -euo pipefail

LITELLM_BASE_URL="http://10.1.10.22:4000/v1"   # litellm gateway (#117)
HERMES_DATA="/opt/hermes"                       # ~/.hermes equivalent on rootfs (fast)
COMPOSE_DIR="/opt/hermes-stack"

echo "==> 1/5 Install rootful Docker + compose plugin"
apt-get update
apt-get install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
. /etc/os-release
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian ${VERSION_CODENAME} stable" \
  > /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
systemctl enable --now docker
docker run --rm hello-world >/dev/null && echo "    docker OK"

echo "==> 2/5 Prepare data + workspace dirs"
mkdir -p "${HERMES_DATA}" "${COMPOSE_DIR}"
# /data (hdd, bulk) and /fast (2tb ssd) are the bind mounts from the LXC.
mkdir -p /data/workspace /fast/workspace

echo "==> 3/5 Write docker-compose.yml"
cat > "${COMPOSE_DIR}/docker-compose.yml" <<EOF
services:
  hermes:
    image: nousresearch/hermes-agent:latest
    container_name: hermes
    restart: unless-stopped
    command: gateway run
    shm_size: "1g"            # browser tools (Playwright/Chromium)
    volumes:
      - ${HERMES_DATA}:/opt/data   # config, memory, skills, sessions (rootfs/SSD)
      - /data:/data                # bulk workspace (hdd 14TB)
      - /fast:/fast                # fast workspace (2tb SSD)
    env_file:
      - ${COMPOSE_DIR}/.env
    deploy:
      resources:
        limits:
          memory: 3G
          cpus: "2.0"
EOF

echo "==> 4/5 Write .env (EDIT secrets before 'gateway run')"
if [ ! -f "${COMPOSE_DIR}/.env" ]; then
  cat > "${COMPOSE_DIR}/.env" <<EOF
# --- litellm gateway (OpenAI-compatible) ---
OPENAI_BASE_URL=${LITELLM_BASE_URL}
OPENAI_API_KEY=REPLACE_WITH_LITELLM_KEY
# --- messaging connectors (fill the ones you use) ---
TELEGRAM_BOT_TOKEN=
DISCORD_BOT_TOKEN=
SLACK_BOT_TOKEN=
EOF
  chmod 600 "${COMPOSE_DIR}/.env"
  echo "    wrote ${COMPOSE_DIR}/.env — edit OPENAI_API_KEY + bot tokens now."
fi

echo "==> 5/5 First-time interactive setup (model -> litellm, sandbox=local, connectors)"
echo "    Run setup, then start the gateway:"
echo "      cd ${COMPOSE_DIR}"
echo "      docker compose run --rm hermes setup     # pick provider=custom, base_url=${LITELLM_BASE_URL}, sandbox=local"
echo "      docker compose up -d                     # start 'gateway run'"
echo "      docker compose logs -f hermes"
echo "Done. (config.yaml lives under ${HERMES_DATA}; secrets stay in ${COMPOSE_DIR}/.env)"
  • Step 2: Lint the script

Run: shellcheck scripts/hermes-bootstrap.sh (if shellcheck is unavailable, run bash -n scripts/hermes-bootstrap.sh) Expected: no errors (info/style notes acceptable). bash -n prints nothing on success.

  • Step 3: Mark executable + commit
chmod +x scripts/hermes-bootstrap.sh
git add scripts/hermes-bootstrap.sh
git commit -m "feat: add Hermes Agent in-container bootstrap script"

Task 9: Run bootstrap + finalize (PVE console for run, workstation for docs)

Files:

  • Modify: README.md

  • Step 1: Get the script into the LXC and run it (LXC console)

The script lives in the repo on the workstation. Get its contents into the container — easiest via the LXC's web-console shell: open an editor (nano /root/hermes-bootstrap.sh) and paste the file, or pipe it through the host with pct exec 118 -- tee /root/hermes-bootstrap.sh while pasting. Then:

pct exec 118 -- bash /root/hermes-bootstrap.sh

Expected: script reaches Done. with docker OK. Then, inside the container, edit /opt/hermes-stack/.env (litellm key + bot tokens) and run the docker compose run --rm hermes setup / up -d lines it printed.

  • Step 2: Update README.md structure table

Add these rows after the pbs-variables.tf row:

| `hermes.tf` | Hermes Agent LXC 컨테이너 정의 (token-safe skeleton) |
| `hermes-variables.tf` | Hermes 관련 변수 |
| `scripts/hermes-bootstrap.sh` | Hermes 인-컨테이너 설치 스크립트 |
  • Step 3: Append a deploy-flow section to README.md
## Hermes Agent (LXC #118)

litellm(#117, `10.1.10.22:4000`)을 LLM 게이트웨이로 쓰는 Nous Research Hermes Agent.
배포는 4단계 (bind mount·features는 API 토큰 불가 → 콘솔 `pct set`):

1. 호스트 준비(node1 콘솔): `mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes && chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes`
2. `terraform apply` (컨테이너 생성)
3. node1 콘솔: `pct set 118 -features nesting=1,keyctl=1 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast && pct reboot 118`
4. LXC 콘솔: `scripts/hermes-bootstrap.sh` 실행 → `/opt/hermes-stack/.env` 채우고 `docker compose run --rm hermes setup``docker compose up -d`

> 비밀값(litellm 키·봇 토큰)은 컨테이너의 `/opt/hermes-stack/.env`에만 두고 repo에 커밋하지 않는다.
> TODO: hermes `mp0/mp1`는 TF state에 없음 → 추후 `terraform import`로 따라잡기.
  • Step 4: Commit docs
git add README.md
git commit -m "docs: document Hermes Agent deploy flow"

Task 10: End-to-end verification

Files: none

  • Step 1: Container + Docker health (node1 console)
pct exec 118 -- docker ps --format '{{.Names}} {{.Status}}'

Expected: hermes Up ... (healthy/running).

  • Step 2: LLM path through litellm (LXC console)
pct exec 118 -- curl -s http://10.1.10.22:4000/v1/models -H "Authorization: Bearer $(grep OPENAI_API_KEY /opt/hermes-stack/.env | cut -d= -f2)" | head -c 400

Expected: a JSON model list from litellm (proves hermes's network path + key reach the gateway). Note the model id(s) — set Hermes model.default to one of these during setup.

  • Step 3: Workspace persistence on the big disk (node1 console)
pct exec 118 -- sh -c 'echo hi > /data/workspace/_probe.txt'
cat /mnt/pve/hdd/hermes/workspace/_probe.txt && rm /mnt/pve/hdd/hermes/workspace/_probe.txt

Expected: hi printed from the host path — proves the agent's /data writes land on /mnt/pve/hdd/hermes (14TB disk).

  • Step 4: Messaging connector end-to-end (manual)

Send a test message from the configured platform (e.g. Telegram) to the bot; confirm Hermes replies. Check docker compose logs -f hermes for the round-trip.

  • Step 5: Final commit (if any uncommitted state/docs)
git add -A && git commit -m "chore: Hermes Agent LXC deploy verified" || echo "nothing to commit"

Notes / Follow-ups

  • TF import: add the mp0/mp1 bind mounts to TF state later via terraform import once a root@pam/SSH path is available (same outstanding task as 115/700 in nfs-lxc-sharing-redesign).
  • Sandbox: start local; revisit Docker sandbox backend (DinD) only if subagent isolation is needed.
  • Memory: after deploy, record the hermes LXC + the API-token-can't-bind-mount/features constraint in project memory.