Files
proxmox-iac/docs/superpowers/plans/2026-06-18-hermes-agent-lxc.md
21in7 29fd340208 docs: plan Task 5 uses targeted apply; flag pre-existing PBS disk drift
terraform plan revealed proxmox_virtual_environment_container.pbs has disk
drift (live 48G vs code 16G). A blanket apply would shrink it, so the hermes
apply must be -targeted. Recorded in the plan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 00:01:26 +09:00

18 KiB
Raw Blame History

Hermes Agent LXC Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Deploy Nous Research Hermes Agent as an unprivileged Docker LXC (#118) on node1 (gihyeon), using the existing litellm LXC (10.1.10.22:4000) as its OpenAI-compatible LLM gateway, with large-disk bind mounts for the agent workspace.

Architecture: Terraform creates a token-safe LXC skeleton (rootfs, network, cpu/mem). Host-security settings the API token cannot set — container features (nesting/keyctl) and bind mounts — are applied once via the PVE web console with pct set. A bootstrap script then installs rootful Docker and runs the official nousresearch/hermes-agent image via compose, pointed at litellm, with sandbox=local and messaging connectors.

Tech Stack: Terraform (bpg/proxmox provider), Proxmox VE 9.1 LXC, Docker + docker-compose, Hermes Agent (Nous Research).

Spec: docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md

Execution split:

  • Workstation (Terraform): Tasks 15, 8, 9 — run terraform against the API token.
  • PVE web console (user runs, pastes output back): Tasks 6, 7 — host/in-container ops (per proxmox-access: host SSH is intentionally unused).

File Structure

File Responsibility
hermes-variables.tf (new) All Hermes LXC input variables with defaults
hermes.tf (new) Debian12 template download (gihyeon) + token-safe container resource (no features, no mounts)
terraform.tfvars (modify) Set Hermes values for this homelab
terraform.tfvars.example (modify) Document Hermes values for other users
outputs.tf (modify) Expose Hermes VMID + hostname
scripts/hermes-bootstrap.sh (new) Host prep + pct set (features+mounts) + Docker install + compose + Hermes config (placeholders for secrets)
README.md (modify) Document the 4-phase deploy flow

Task 1: Hermes input variables

Files:

  • Create: hermes-variables.tf

  • Step 1: Write hermes-variables.tf

variable "hermes_vmid" {
  description = "VMID for the Hermes Agent LXC"
  type        = number
  default     = 118
}

variable "hermes_hostname" {
  description = "Hostname for the Hermes Agent LXC"
  type        = string
  default     = "hermes"
}

variable "hermes_node" {
  description = "Proxmox node to host the Hermes Agent LXC"
  type        = string
  default     = "gihyeon"
}

variable "hermes_cores" {
  description = "CPU cores for the Hermes Agent LXC"
  type        = number
  default     = 2
}

variable "hermes_memory" {
  description = "Dedicated memory (MB) for the Hermes Agent LXC"
  type        = number
  default     = 4096
}

variable "hermes_swap" {
  description = "Swap (MB) for the Hermes Agent LXC"
  type        = number
  default     = 512
}

variable "hermes_disk_size" {
  description = "Root filesystem size (GB) for the Hermes Agent LXC"
  type        = number
  default     = 24
}

variable "hermes_datastore" {
  description = "Datastore for the Hermes Agent LXC root filesystem"
  type        = string
  default     = "local-lvm"
}

variable "hermes_network_bridge" {
  description = "Network bridge (SDN VNET) for the Hermes Agent LXC"
  type        = string
  default     = "intra01"
}
  • Step 2: Format + validate

Run: terraform fmt hermes-variables.tf && terraform validate Expected: Success! The configuration is valid. (validate may warn about the missing hermes.tf resource until Task 2 — that is fine; the goal here is no HCL syntax error in this file.)

  • Step 3: Commit
git add hermes-variables.tf
git commit -m "feat: add Hermes Agent LXC variables"

Task 2: Hermes container resource (token-safe skeleton)

Files:

  • Create: hermes.tf

Reuses the existing var.dns_servers (defined in pbs-variables.tf).

  • Step 1: Write hermes.tf
# Download Debian 12 LXC template to gihyeon (node1).
resource "proxmox_virtual_environment_download_file" "debian12_template_gihyeon" {
  content_type = "vztmpl"
  datastore_id = "local"
  node_name    = var.hermes_node
  url          = "http://download.proxmox.com/images/system/debian-12-standard_12.12-1_amd64.tar.zst"
}

# Hermes Agent LXC — token-safe skeleton.
# IMPORTANT: container `features` (nesting/keyctl) and bind mounts are NOT set
# here. The Proxmox API token cannot set host-security settings; they are applied
# once via the PVE web console with `pct set` (see scripts/hermes-bootstrap.sh
# and docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md).
resource "proxmox_virtual_environment_container" "hermes" {
  description   = "Hermes Agent (Nous Research) - Managed by Terraform"
  node_name     = var.hermes_node
  vm_id         = var.hermes_vmid
  start_on_boot = true
  unprivileged  = true
  tags          = ["ai", "agent", "terraform"]

  operating_system {
    template_file_id = proxmox_virtual_environment_download_file.debian12_template_gihyeon.id
    type             = "debian"
  }

  cpu {
    cores = var.hermes_cores
  }

  memory {
    dedicated = var.hermes_memory
    swap      = var.hermes_swap
  }

  disk {
    datastore_id = var.hermes_datastore
    size         = var.hermes_disk_size
  }

  network_interface {
    name   = "eth0"
    bridge = var.hermes_network_bridge
  }

  initialization {
    hostname = var.hermes_hostname

    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    dns {
      servers = var.dns_servers
    }
  }
}
  • Step 2: Format + validate

Run: terraform fmt hermes.tf && terraform validate Expected: Success! The configuration is valid.

  • Step 3: Commit
git add hermes.tf
git commit -m "feat: add Hermes Agent LXC container resource"

Task 3: tfvars values

Files:

  • Modify: terraform.tfvars
  • Modify: terraform.tfvars.example

Defaults in hermes-variables.tf already match this homelab, so tfvars only needs an explicit override block for clarity/discoverability.

  • Step 1: Append to terraform.tfvars

Add after the existing DNS line:


# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid           = 118
hermes_node           = "gihyeon"
hermes_network_bridge = "intra01"
  • Step 2: Append the same block to terraform.tfvars.example

# Hermes Agent LXC 설정 (node1 / intra01)
hermes_vmid           = 118
hermes_node           = "gihyeon"
hermes_network_bridge = "intra01"
  • Step 3: Validate

Run: terraform fmt && terraform validate Expected: Success! The configuration is valid.

  • Step 4: Commit
git add terraform.tfvars terraform.tfvars.example
git commit -m "feat: set Hermes Agent LXC tfvars"

Task 4: Outputs

Files:

  • Modify: outputs.tf

  • Step 1: Append to outputs.tf


output "hermes_container_id" {
  description = "Hermes Agent LXC container ID"
  value       = proxmox_virtual_environment_container.hermes.vm_id
}

output "hermes_hostname" {
  description = "Hermes Agent LXC hostname (IP is DHCP-assigned; discover via PVE/API)"
  value       = var.hermes_hostname
}
  • Step 2: Validate

Run: terraform validate Expected: Success! The configuration is valid.

  • Step 3: Commit
git add outputs.tf
git commit -m "feat: add Hermes Agent LXC outputs"

Task 5: Plan + apply the container (workstation)

Files: none (infra apply)

  • Step 1: Review the plan

Run: terraform plan Expected: 2 to addproxmox_virtual_environment_download_file.debian12_template_gihyeon and proxmox_virtual_environment_container.hermes.

⚠️ Known pre-existing drift: the plan ALSO shows 1 to changeproxmox_virtual_environment_container.pbs disk size = 48 -> 16. The live PBS rootfs is 48GB but pbs.tf declares 16GB. A blanket apply would try to shrink the PBS disk (dangerous). Do NOT untargeted-apply. Reconcile separately by setting pbs.tf size = 48 to match reality (no infra change), or leave it and always target hermes.

  • Step 2: Apply (TARGETED to hermes only)

Run:

terraform apply \
  -target=proxmox_virtual_environment_download_file.debian12_template_gihyeon \
  -target=proxmox_virtual_environment_container.hermes

Expected: Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs include hermes_container_id = 118. The -target flags ensure the PBS disk drift is NOT touched.

If apply errors with a permission/root@pam-only message on any container attribute, STOP — it means an attribute in hermes.tf is host-restricted. The skeleton here is intentionally limited to attributes the PBS container already created successfully via the same token, so this is not expected.

  • Step 3: Confirm via API (read-only)

Run:

curl -sk -H "Authorization: PVEAPIToken=root@pam!terrform=1408ded5-c7c4-4384-8b19-64178837fb8c" \
  "https://192.168.50.87:8006/api2/json/nodes/gihyeon/lxc/118/status/current" \
  | python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(d['name'], d['status'])"

Expected: hermes running (or stopped — the container may not auto-start before features/mounts; Task 7 reboots it).

  • Step 4: Commit state
git add terraform.tfstate terraform.tfstate.backup
git commit -m "chore: apply Hermes Agent LXC (state)"

Task 6: Host prep — create + chown bind-mount targets (PVE console)

Run in the node1 (gihyeon) shell via PVE web console. Paste output back.

  • Step 1: Create the workspace dirs and chown to the unprivileged-mapped root
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
ls -lnd /mnt/pve/hdd/hermes /media/2tb/hermes

Expected: both dirs exist and ls -lnd shows owner/group 100000 100000.


Task 7: Apply features + bind mounts, reboot (PVE console)

Run in the node1 (gihyeon) shell via PVE web console. Paste output back.

  • Step 1: Set features (Docker) and the two bind mounts
pct set 118 -features nesting=1,keyctl=1 \
  -mp0 /mnt/pve/hdd/hermes,mp=/data \
  -mp1 /media/2tb/hermes,mp=/fast
pct reboot 118

Expected: no error output from pct set; container reboots.

  • Step 2: Verify config + writable mounts
pct config 118 | grep -E 'features|mp0|mp1'
pct exec 118 -- sh -c 'touch /data/.w /fast/.w && ls -l /data/.w /fast/.w && rm /data/.w /fast/.w && echo MOUNTS_OK'

Expected: features: keyctl=1,nesting=1, mp0: /mnt/pve/hdd/hermes,mp=/data, mp1: /media/2tb/hermes,mp=/fast, and MOUNTS_OK (proves the unprivileged container's root can write to both bind mounts).


Task 8: Bootstrap script (workstation authoring)

Files:

  • Create: scripts/hermes-bootstrap.sh

This script is authored and committed on the workstation, then run inside the LXC console in Task 9. It contains NO real secrets — only placeholders the operator edits in-container.

  • Step 1: Write scripts/hermes-bootstrap.sh
#!/usr/bin/env bash
# Hermes Agent bootstrap — run INSIDE the hermes LXC (#118) console, once.
# Prereqs (already done): features nesting/keyctl set, /data and /fast bind mounts present.
set -euo pipefail

LITELLM_BASE_URL="http://10.1.10.22:4000/v1"   # litellm gateway (#117)
HERMES_DATA="/opt/hermes"                       # ~/.hermes equivalent on rootfs (fast)
COMPOSE_DIR="/opt/hermes-stack"

echo "==> 1/5 Install rootful Docker + compose plugin"
apt-get update
apt-get install -y ca-certificates curl gnupg
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
. /etc/os-release
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian ${VERSION_CODENAME} stable" \
  > /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
systemctl enable --now docker
docker run --rm hello-world >/dev/null && echo "    docker OK"

echo "==> 2/5 Prepare data + workspace dirs"
mkdir -p "${HERMES_DATA}" "${COMPOSE_DIR}"
# /data (hdd, bulk) and /fast (2tb ssd) are the bind mounts from the LXC.
mkdir -p /data/workspace /fast/workspace

echo "==> 3/5 Write docker-compose.yml"
cat > "${COMPOSE_DIR}/docker-compose.yml" <<EOF
services:
  hermes:
    image: nousresearch/hermes-agent:latest
    container_name: hermes
    restart: unless-stopped
    command: gateway run
    shm_size: "1g"            # browser tools (Playwright/Chromium)
    volumes:
      - ${HERMES_DATA}:/opt/data   # config, memory, skills, sessions (rootfs/SSD)
      - /data:/data                # bulk workspace (hdd 14TB)
      - /fast:/fast                # fast workspace (2tb SSD)
    env_file:
      - ${COMPOSE_DIR}/.env
    deploy:
      resources:
        limits:
          memory: 3G
          cpus: "2.0"
EOF

echo "==> 4/5 Write .env (EDIT secrets before 'gateway run')"
if [ ! -f "${COMPOSE_DIR}/.env" ]; then
  cat > "${COMPOSE_DIR}/.env" <<EOF
# --- litellm gateway (OpenAI-compatible) ---
OPENAI_BASE_URL=${LITELLM_BASE_URL}
OPENAI_API_KEY=REPLACE_WITH_LITELLM_KEY
# --- messaging connectors (fill the ones you use) ---
TELEGRAM_BOT_TOKEN=
DISCORD_BOT_TOKEN=
SLACK_BOT_TOKEN=
EOF
  chmod 600 "${COMPOSE_DIR}/.env"
  echo "    wrote ${COMPOSE_DIR}/.env — edit OPENAI_API_KEY + bot tokens now."
fi

echo "==> 5/5 First-time interactive setup (model -> litellm, sandbox=local, connectors)"
echo "    Run setup, then start the gateway:"
echo "      cd ${COMPOSE_DIR}"
echo "      docker compose run --rm hermes setup     # pick provider=custom, base_url=${LITELLM_BASE_URL}, sandbox=local"
echo "      docker compose up -d                     # start 'gateway run'"
echo "      docker compose logs -f hermes"
echo "Done. (config.yaml lives under ${HERMES_DATA}; secrets stay in ${COMPOSE_DIR}/.env)"
  • Step 2: Lint the script

Run: shellcheck scripts/hermes-bootstrap.sh (if shellcheck is unavailable, run bash -n scripts/hermes-bootstrap.sh) Expected: no errors (info/style notes acceptable). bash -n prints nothing on success.

  • Step 3: Mark executable + commit
chmod +x scripts/hermes-bootstrap.sh
git add scripts/hermes-bootstrap.sh
git commit -m "feat: add Hermes Agent in-container bootstrap script"

Task 9: Run bootstrap + finalize (PVE console for run, workstation for docs)

Files:

  • Modify: README.md

  • Step 1: Get the script into the LXC and run it (LXC console)

The script lives in the repo on the workstation. Get its contents into the container — easiest via the LXC's web-console shell: open an editor (nano /root/hermes-bootstrap.sh) and paste the file, or pipe it through the host with pct exec 118 -- tee /root/hermes-bootstrap.sh while pasting. Then:

pct exec 118 -- bash /root/hermes-bootstrap.sh

Expected: script reaches Done. with docker OK. Then, inside the container, edit /opt/hermes-stack/.env (litellm key + bot tokens) and run the docker compose run --rm hermes setup / up -d lines it printed.

  • Step 2: Update README.md structure table

Add these rows after the pbs-variables.tf row:

| `hermes.tf` | Hermes Agent LXC 컨테이너 정의 (token-safe skeleton) |
| `hermes-variables.tf` | Hermes 관련 변수 |
| `scripts/hermes-bootstrap.sh` | Hermes 인-컨테이너 설치 스크립트 |
  • Step 3: Append a deploy-flow section to README.md
## Hermes Agent (LXC #118)

litellm(#117, `10.1.10.22:4000`)을 LLM 게이트웨이로 쓰는 Nous Research Hermes Agent.
배포는 4단계 (bind mount·features는 API 토큰 불가 → 콘솔 `pct set`):

1. 호스트 준비(node1 콘솔): `mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes && chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes`
2. `terraform apply` (컨테이너 생성)
3. node1 콘솔: `pct set 118 -features nesting=1,keyctl=1 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast && pct reboot 118`
4. LXC 콘솔: `scripts/hermes-bootstrap.sh` 실행 → `/opt/hermes-stack/.env` 채우고 `docker compose run --rm hermes setup``docker compose up -d`

> 비밀값(litellm 키·봇 토큰)은 컨테이너의 `/opt/hermes-stack/.env`에만 두고 repo에 커밋하지 않는다.
> TODO: hermes `mp0/mp1`는 TF state에 없음 → 추후 `terraform import`로 따라잡기.
  • Step 4: Commit docs
git add README.md
git commit -m "docs: document Hermes Agent deploy flow"

Task 10: End-to-end verification

Files: none

  • Step 1: Container + Docker health (node1 console)
pct exec 118 -- docker ps --format '{{.Names}} {{.Status}}'

Expected: hermes Up ... (healthy/running).

  • Step 2: LLM path through litellm (LXC console)
pct exec 118 -- curl -s http://10.1.10.22:4000/v1/models -H "Authorization: Bearer $(grep OPENAI_API_KEY /opt/hermes-stack/.env | cut -d= -f2)" | head -c 400

Expected: a JSON model list from litellm (proves hermes's network path + key reach the gateway). Note the model id(s) — set Hermes model.default to one of these during setup.

  • Step 3: Workspace persistence on the big disk (node1 console)
pct exec 118 -- sh -c 'echo hi > /data/workspace/_probe.txt'
cat /mnt/pve/hdd/hermes/workspace/_probe.txt && rm /mnt/pve/hdd/hermes/workspace/_probe.txt

Expected: hi printed from the host path — proves the agent's /data writes land on /mnt/pve/hdd/hermes (14TB disk).

  • Step 4: Messaging connector end-to-end (manual)

Send a test message from the configured platform (e.g. Telegram) to the bot; confirm Hermes replies. Check docker compose logs -f hermes for the round-trip.

  • Step 5: Final commit (if any uncommitted state/docs)
git add -A && git commit -m "chore: Hermes Agent LXC deploy verified" || echo "nothing to commit"

Notes / Follow-ups

  • TF import: add the mp0/mp1 bind mounts to TF state later via terraform import once a root@pam/SSH path is available (same outstanding task as 115/700 in nfs-lxc-sharing-redesign).
  • Sandbox: start local; revisit Docker sandbox backend (DinD) only if subagent isolation is needed.
  • Memory: after deploy, record the hermes LXC + the API-token-can't-bind-mount/features constraint in project memory.