Deploy Hermes Agent LXC (#118) on gihyeon + IaC hygiene #1

Merged
gihyeon merged 11 commits from hermes-agent-lxc into main 2026-06-19 11:04:17 +09:00
3 changed files with 33 additions and 24 deletions
Showing only changes of commit f6dc709793 - Show all commits

View File

@@ -47,12 +47,13 @@ terraform apply
## Hermes Agent (LXC #118)
litellm(#117, `10.1.10.22:4000`)을 LLM 게이트웨이로 쓰는 Nous Research Hermes Agent.
배포 4단계 (bind mount·features는 API 토큰 불가 → 콘솔 `pct set`):
배포 4단계. `features(nesting/keyctl)`**TF가 설정**(토큰 OK)하고, **bind mount(`mp0/mp1`)만 콘솔 `pct set`**(호스트경로 마운트는 root@pam 필요):
1. 호스트 준비(node1 콘솔): `mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes && chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes`
2. `terraform apply` (컨테이너 생성)
3. node1 콘솔: `pct set 118 -features nesting=1,keyctl=1 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast && pct reboot 118`
2. `terraform apply -target=proxmox_virtual_environment_download_file.debian12_template_gihyeon -target=proxmox_virtual_environment_container.hermes` (컨테이너 생성 — `nesting`/`keyctl` 포함. `-target`은 PBS 디스크 드리프트 회피)
3. node1 콘솔(bind mount만): `pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data -mp1 /media/2tb/hermes,mp=/fast && pct reboot 118`
4. 스크립트를 LXC에 넣고 실행 — 호스트(node1)에서 `pct push 118 scripts/hermes-bootstrap.sh /root/hermes-bootstrap.sh --perms 0755` (또는 LXC 콘솔 편집기로 붙여넣기) → LXC 콘솔에서 `bash /root/hermes-bootstrap.sh``/opt/hermes-stack/.env` 채우고 `docker compose run --rm hermes setup``docker compose up -d`
> 비밀값(litellm 키·봇 토큰)은 컨테이너의 `/opt/hermes-stack/.env`에만 두고 repo에 커밋하지 않는다.
> 왜 `-target`?: `pbs.tf` disk가 실제(48G)와 다르게 16G로 선언돼 있어 무필터 apply는 PBS 디스크 축소를 시도함.
> TODO: hermes `mp0/mp1`는 TF state에 없음 → 추후 `terraform import`로 따라잡기.

View File

@@ -4,7 +4,7 @@
**Goal:** Deploy Nous Research Hermes Agent as an unprivileged Docker LXC (#118) on node1 (`gihyeon`), using the existing litellm LXC (`10.1.10.22:4000`) as its OpenAI-compatible LLM gateway, with large-disk bind mounts for the agent workspace.
**Architecture:** Terraform creates a token-safe LXC skeleton (rootfs, network, cpu/mem). Host-security settings the API token cannot set — container `features` (nesting/keyctl) and bind mounts — are applied once via the PVE web console with `pct set`. A bootstrap script then installs rootful Docker and runs the official `nousresearch/hermes-agent` image via compose, pointed at litellm, with `sandbox=local` and messaging connectors.
**Architecture:** Terraform creates the LXC including `features { nesting/keyctl }` (the token CAN set these on an unprivileged CT, and nesting at create time avoids the systemd-252 "enable nesting" warning that otherwise fails the apply). The only host setting the API token cannot do is **bind mounts** (host paths require root@pam), so `mp0/mp1` are added once via the PVE web console with `pct set`. A bootstrap script then installs rootful Docker and runs the official `nousresearch/hermes-agent` image via compose, pointed at litellm, with `sandbox=local` and messaging connectors.
**Tech Stack:** Terraform (bpg/proxmox provider), Proxmox VE 9.1 LXC, Docker + docker-compose, Hermes Agent (Nous Research).
@@ -125,11 +125,14 @@ resource "proxmox_virtual_environment_download_file" "debian12_template_gihyeon"
url = "http://download.proxmox.com/images/system/debian-12-standard_12.12-1_amd64.tar.zst"
}
# Hermes Agent LXC — token-safe skeleton.
# IMPORTANT: container `features` (nesting/keyctl) and bind mounts are NOT set
# here. The Proxmox API token cannot set host-security settings; they are applied
# once via the PVE web console with `pct set` (see scripts/hermes-bootstrap.sh
# and docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md).
# Hermes Agent LXC.
# `features` (nesting/keyctl) ARE set here: on an unprivileged container these need
# only VM.Allocate, which the API token has, so Terraform can set them. nesting is
# also required so the systemd-252 (Debian 12) create does not emit the "enable
# nesting" warning that Proxmox returns as TASK WARNINGS (which fails the apply).
# Bind mounts (mp0/mp1, host paths) genuinely DO require root@pam, so those are still
# added via the PVE web console with `pct set` (see scripts/hermes-bootstrap.sh and
# docs/superpowers/specs/2026-06-18-hermes-agent-lxc-design.md).
resource "proxmox_virtual_environment_container" "hermes" {
description = "Hermes Agent (Nous Research) - Managed by Terraform"
node_name = var.hermes_node
@@ -138,6 +141,11 @@ resource "proxmox_virtual_environment_container" "hermes" {
unprivileged = true
tags = ["ai", "agent", "terraform"]
features {
nesting = true
keyctl = true
}
operating_system {
template_file_id = proxmox_virtual_environment_download_file.debian12_template_gihyeon.id
type = "debian"
@@ -327,16 +335,17 @@ Expected: both dirs exist and `ls -lnd` shows owner/group `100000 100000`.
---
## Task 7: Apply features + bind mounts, reboot (PVE console)
## Task 7: Add bind mounts, reboot (PVE console)
**Run in the node1 (`gihyeon`) shell via PVE web console. Paste output back.**
- [ ] **Step 1: Set features (Docker) and the two bind mounts**
> NOTE: `features` (nesting/keyctl) are already set by Terraform (Task 2) — the API token CAN set them on an unprivileged CT, and `nesting` at create time is required to avoid the "enable nesting" warning that fails the apply. Only bind mounts need the console (host-path mounts require root@pam).
- [ ] **Step 1: Add the two bind mounts**
```sh
pct set 118 -features nesting=1,keyctl=1 \
-mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct reboot 118
```
Expected: no error output from `pct set`; container reboots.
@@ -347,7 +356,7 @@ Expected: no error output from `pct set`; container reboots.
pct config 118 | grep -E 'features|mp0|mp1'
pct exec 118 -- sh -c 'touch /data/.w /fast/.w && ls -l /data/.w /fast/.w && rm /data/.w /fast/.w && echo MOUNTS_OK'
```
Expected: `features: keyctl=1,nesting=1`, `mp0: /mnt/pve/hdd/hermes,mp=/data`, `mp1: /media/2tb/hermes,mp=/fast`, and `MOUNTS_OK` (proves the unprivileged container's root can write to both bind mounts).
Expected: `features: keyctl=1,nesting=1` (set by TF), `mp0: /mnt/pve/hdd/hermes,mp=/data`, `mp1: /media/2tb/hermes,mp=/fast`, and `MOUNTS_OK` (proves the unprivileged container's root can write to both bind mounts).
---

View File

@@ -48,7 +48,7 @@ and generated files on the host's large disks via direct bind mounts.
| Decision | Choice | Rationale |
|---|---|---|
| Deployment form | **Docker LXC (unprivileged)** | Matches homelab convention (multiple docker LXCs: 101/104/119/124); low overhead; official image + clean upgrades; Hermes needs no privileged mode. |
| Provisioning | **Terraform (container only) + console for bind mounts** | TF mirrors `pbs.tf` for the container. **Bind mounts cannot be created via API token** (Proxmox restricts them to `root@pam`/SSH), so `mp0/mp1` are added via console `pct set` — same method already used for jellyfin(115)/tos-api(700). `terraform import` of the mounts is a follow-up. |
| Provisioning | **Terraform (container incl. features) + console for bind mounts** | TF mirrors `pbs.tf` and also sets `features { nesting/keyctl }` (token CAN do this on an unprivileged CT; nesting at create time avoids the systemd-252 "enable nesting" warning that fails the apply). **Only bind mounts** can't be done by the token (host paths require `root@pam`), so `mp0/mp1` are added via console `pct set` — same method already used for jellyfin(115)/tos-api(700). `terraform import` of the mounts is a follow-up. |
| Primary interface | **Messaging connectors** | Outbound-only → **zero inbound ports exposed.** |
| Subagent sandbox | **local** | Avoids Docker-in-Docker friction in an unprivileged LXC; revisit later if isolation needed. |
| Large workspace | **Direct host bind mount (both disks)** | Aligns with the user's **Plan A** (same-host LXC → host bind mount, not nfs LXC re-share). No network hop, no nfs-LXC SPOF. See `nfs-lxc-sharing-redesign` memory. |
@@ -79,7 +79,7 @@ and generated files on the host's large disks via direct bind mounts.
| VMID | `118` (adjacent to litellm `117`, AI group) |
| Node | `gihyeon` |
| Type | unprivileged LXC, Debian 12 |
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) — **set via console `pct set`**, not TF (API token can't set host-security features) |
| Features | `nesting = 1`, `keyctl = 1` (required for Docker) — **set in Terraform** (token can set these on an unprivileged CT; nesting at create avoids the systemd-252 warning that fails the apply) |
| CPU / RAM | 2 cores / 4096 MB dedicated (+512 MB swap) |
| rootfs | 24 GB on `local-lvm` |
| Network | `eth0` on bridge `intra01`, IPv4 DHCP |
@@ -132,14 +132,13 @@ subtree is remapped** (isolation preserved), not the whole disk.
mkdir -p /mnt/pve/hdd/hermes /media/2tb/hermes
chown 100000:100000 /mnt/pve/hdd/hermes /media/2tb/hermes
```
2. **Terraform apply** (from workstation): creates LXC #118 token-safe skeleton
(rootfs, network, cpu/mem, unprivileged, onboot). **No features, no bind mounts**
(API-token can't set host-security settings).
3. **Apply features + bind mounts** (node1 console, once): use `pct set`:
2. **Terraform apply** (from workstation, `-target` hermes only): creates LXC #118
with rootfs, network, cpu/mem, unprivileged, onboot, **and `features { nesting/keyctl }`**.
No bind mounts (host paths need root@pam). `-target` avoids the pre-existing PBS disk drift.
3. **Add bind mounts** (node1 console, once): use `pct set` (mounts only — features already in TF):
```sh
pct set 118 -features nesting=1,keyctl=1 \
-mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct set 118 -mp0 /mnt/pve/hdd/hermes,mp=/data \
-mp1 /media/2tb/hermes,mp=/fast
pct reboot 118
```
4. **Container bootstrap** (LXC console, once): `scripts/hermes-bootstrap.sh` —