Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".
This walkthrough follows the as-built lifecycle control surfaces exposed by NineDoor and cohsh
for 0.9.0-beta. It includes hive-gateway (REST multiplexer), live GPU publish, PEFT flows,
host-sidecar telemetry, Multi-Hive federation via ticket relay, and Live Hive text overlays.
For host tool usage, interdependencies, and policy/mount details, see
HOST_TOOLS.md.
coh> indicates the cohsh prompt.hive-gateway, keep it attached and route
all other tools through REST.127.0.0.1:31337./policy/rules), writes to /queen/ctl require approvals queued in /actions/queue./gpu/* appears only after gpu-bridge-host --publish runs; /host/* appears only after host-sidecar-bridge runs.--mock) do not talk to the VM; do not mix mock and live in the same session.hive-gateway is the sole console client; everything else must use REST (--rest-url).LS, CAT, and ECHO. It does not add new verbs or semantics.queen); there is no per-request ticket.host-ticket/v1 surfaces (/host/tickets/spec|status|deadletter); no new VM-side verbs are introduced.host-ticket-agent --relay forwards only manifest-allowlisted actions (ecosystem.host.federation.action_allowlist) to configured peers.source_hive, target_hive, relay_hop, relay_correlation_id.source_hive and target_hive are pair-required; local flow deduplicates by id + idempotency_key, federated flow deduplicates by id + idempotency_key + source_hive + target_hive.relay_hop is monotonic and bounded (1..=32); first forwarded hop is 1.relay_correlation_id is deterministic and aligns with the federated idempotency key by default.--watch data landshost-sidecar-bridge --watch continuously refreshes /host/* (for example /host/systemd/status, /host/nvidia/gpu/0/status)./host/* with cohsh (ls /host, cat /host/systemd/status), REST (/v1/fs/ls, /v1/fs/cat), or a coh mount.Goal: run hive-gateway as the only console client and use REST for all tools.
./qemu/run.sh
COH_TCP_HOST=127.0.0.1 COH_TCP_PORT=31337 COH_AUTH_TOKEN=changeme \
HIVE_GATEWAY_REQUEST_AUTH_TOKEN=replace-with-real-token \
COH_ROLE=queen HIVE_GATEWAY_BIND=127.0.0.1:8080 \
./bin/hive-gateway
curl -sS http://127.0.0.1:8080/v1/meta/bounds | jq .
curl -sS 'http://127.0.0.1:8080/v1/fs/ls?path=/' | jq .
cohsh via REST (not TCP).
./bin/cohsh --transport rest --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" --role queen
./bin/gpu-bridge-host --publish --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" --interval-ms 1000
./bin/host-sidecar-bridge --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" --watch
SWARMUI_TRANSPORT=rest SWARMUI_REST_URL=http://127.0.0.1:8080 \
SWARMUI_REST_AUTH_TOKEN="$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" ./bin/swarmui
Evidence packs are deterministic, host-side exports sourced only from existing Cohesix surfaces (/proc, /log, /audit, /replay, telemetry). They are suitable for due diligence, incident review, and compliance artifacts without introducing new control-plane semantics.
hive-gateway is running (see quickstart above)../bin/coh evidence pack --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
--out ./out/evidence/live --with-telemetry
./bin/coh evidence timeline --in ./out/evidence/live
Validate the pack layout and emit a CI-friendly JSON summary:
python3 tools/cohesix-py/examples/ci_evidence_pack.py --pack ./out/evidence/live
Export normalized NDJSON for SIEM ingestion (Splunk/Elastic):
python3 tools/cohesix-py/examples/siem_export_ndjson.py --pack ./out/evidence/live \
--out ./out/evidence/live/siem.ndjson
These scenarios use hive-gateway as the sole console client and route all host tools through REST. This keeps the console single-client while enabling multi-tool usage.
./qemu/run.sh in the release bundle).COH_TCP_HOST=127.0.0.1 COH_TCP_PORT=31337 COH_AUTH_TOKEN=changeme \
COH_ROLE=queen HIVE_GATEWAY_BIND=127.0.0.1:8080 \
./bin/hive-gateway
./bin/gpu-bridge-host --publish --rest-url http://127.0.0.1:8080 --interval-ms 1000
./bin/host-sidecar-bridge --rest-url http://127.0.0.1:8080 --watch --provider systemd --provider nvidia
ssh -L 8080:127.0.0.1:8080 <gpu-host>
SWARMUI_TRANSPORT=rest SWARMUI_REST_URL=http://127.0.0.1:8080 ./bin/swarmui
hive-gateway on the queen host (Scenario A, step 2).ssh -L 8080:127.0.0.1:8080 <queen-host>
./bin/gpu-bridge-host --publish --rest-url http://127.0.0.1:8080 --interval-ms 1000
./bin/host-sidecar-bridge --rest-url http://127.0.0.1:8080 --watch --provider systemd --provider nvidia
--rest-url./gpu/* and /host/* are single namespaces. If two hosts publish simultaneously, the most recent write wins. For deterministic demos, stagger publishes (alternate every N seconds) or keep one publisher active at a time../bin/cas-tool pack --epoch 1 --input ./out/cas/payload --out-dir ./out/cas/1 --signing-key ./resources/fixtures/cas_signing_key.hex
./bin/cas-tool upload --bundle ./out/cas/1 --rest-url http://127.0.0.1:8080
./bin/coh mount --rest-url http://127.0.0.1:8080 --rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
--at /tmp/coh-mount-rest
Use REST to automate an operator script without taking the console.
hive-gateway (Scenario A, step 2)..coh script over REST:
./bin/cohsh --transport rest --rest-url http://127.0.0.1:8080 --role queen \
--script scripts/cohsh/boot_v0.coh
In a release bundle, replace the script path with your own .coh file.
Use this when source-hive operators need deterministic ticket execution on a remote hive.
hive-mac), run gateway + relay-capable ticket agent:
COH_TCP_HOST=127.0.0.1 COH_TCP_PORT=31337 COH_AUTH_TOKEN="$COH_AUTH_TOKEN" \
HIVE_GATEWAY_REQUEST_AUTH_TOKEN="$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
COH_ROLE=queen HIVE_GATEWAY_BIND=127.0.0.1:8080 \
./bin/hive-gateway
./bin/host-ticket-agent --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
--relay --relay-wal out/host-ticket-agent/relay-wal.json
hive-jetson), run gateway + ticket agent (no relay required):
COH_TCP_HOST=127.0.0.1 COH_TCP_PORT=31337 COH_AUTH_TOKEN="$COH_AUTH_TOKEN" \
HIVE_GATEWAY_REQUEST_AUTH_TOKEN="$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
COH_ROLE=queen HIVE_GATEWAY_BIND=127.0.0.1:8080 \
./bin/hive-gateway
./bin/host-ticket-agent --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN"
curl -sS -X POST http://127.0.0.1:8080/v1/fs/echo \
-H "Authorization: Bearer ${HIVE_GATEWAY_REQUEST_AUTH_TOKEN}" \
-H 'Content-Type: application/json' \
-d '{"path":"/host/tickets/spec","line":"{\"schema\":\"host-ticket/v1\",\"id\":\"fed-systemd-1\",\"idempotency_key\":\"fed-20260221-1\",\"action\":\"systemd.restart\",\"target\":\"/host/systemd/cohesix-agent.service/restart\",\"source_hive\":\"hive-mac\",\"target_hive\":\"hive-jetson\"}"}'
ssh -L 8081:127.0.0.1:8080 <jetson-host>
curl -sS 'http://127.0.0.1:8081/v1/fs/cat?path=/host/tickets/status&max_bytes=4096' | jq .
Expected: ticket result lines include source_hive, target_hive, and a positive relay_hop.
./bin/coh evidence pack --rest-url http://127.0.0.1:8081 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
--out ./out/evidence/federated --with-telemetry
./bin/coh evidence timeline --in ./out/evidence/federated
coh mount) over TCP vs RESTcoh mount exposes a host filesystem view over Secure9P namespaces. It is a projection of LS/CAT/ECHO with manifest-derived bounds and policy allowlists enforced.
Prerequisites:
sudo apt-get update && sudo apt-get install -y fuse3). Confirm fusermount3 exists./dev/macfuse0 exists, or /dev/osxfuse0 on older OSXFUSE). Cohesix bundles ship coh with FUSE enabled, but mounts fail until the MacFUSE runtime is active.Transport selection:
hive-gateway concurrently.hive-gateway and supports multi-tool/multi-host usage while keeping the console single-client. REST writes require request-auth../bin/coh mount --host 127.0.0.1 --port 31337 --auth-token "${COH_AUTH_TOKEN}" --at /tmp/coh-mount-tcp
If you need more than one host/tool concurrently, stop the TCP mount and use the gateway + REST mount instead.
Remote TCP mounts over high-latency links (for example AWS → Mac over an SSH reverse tunnel) are supported, but remain single-client; prefer hive-gateway + REST for remote multi-host operation.
Start the gateway first (Scenario A), then mount through REST:
./bin/coh mount --rest-url http://127.0.0.1:8080 --rest-auth-token "${HIVE_GATEWAY_REQUEST_AUTH_TOKEN}" \
--at /tmp/coh-mount-rest
Validate reads:
cat /tmp/coh-mount-rest/proc/lifecycle/state
head -n 5 /tmp/coh-mount-rest/log/queen.log
This is the safe, OS-owned “file transfer” surface: create telemetry segments via /queen/telemetry/<device>/ctl with a MIME type, then append records to the OS-named segment.
On Host A (for example Jetson):
MNT=/tmp/coh-mount-rest
DEV=jetson-xfer-1
printf '{"new":"segment","mime":"text/plain"}\n' >> "${MNT}/queen/telemetry/${DEV}/ctl"
printf "hello-from-jetson ts_ms=%s\n" "$(date +%s000)" >> "${MNT}/queen/telemetry/${DEV}/seg/seg-000001"
On Host B (for example g5g), confirm visibility:
MNT=/tmp/coh-mount-rest
DEV=jetson-xfer-1
cat "${MNT}/queen/telemetry/${DEV}/latest"
tail -n 5 "${MNT}/queen/telemetry/${DEV}/seg/seg-000001"
Repeat the same flow in the opposite direction with a different DEV value (for example g5g-xfer-1) and verify Host A can read it.
Choose one path depending on your transport:
TCP console: Attach and verify the root namespace is reachable:
./bin/cohsh --transport tcp --tcp-host 127.0.0.1 --tcp-port 31337 --role queen
coh> ping
coh> ls /
If policy gating is enabled, confirm rules and current pressure:
coh> cat /policy/rules
coh> cat /proc/pressure/policy
REST gateway:
curl -sS http://127.0.0.1:8080/v1/meta/bounds | jq .
curl -sS 'http://127.0.0.1:8080/v1/fs/ls?path=/' | jq .
coh> attach queen
Expected: OK ATTACH.
coh> cat /proc/lifecycle/state
coh> cat /proc/lifecycle/reason
coh> cat /proc/lifecycle/since
Example output:
state=ONLINE
reason=boot-complete
since_ms=0
coh> lifecycle cordon
coh> cat /proc/lifecycle/state
Expected:
state=DRAINING
A matching audit line appears in /log/queen.log:
lifecycle transition old=ONLINE new=DRAINING reason=cordon
Ensure there are no outstanding leases or active workers, then drain:
coh> lifecycle drain
coh> cat /proc/lifecycle/state
Expected:
state=QUIESCED
If leases remain, the command returns ERR and /log/queen.log reports:
lifecycle denied action=drain state=DRAINING reason=outstanding-leases leases=<n>
coh> lifecycle resume
coh> cat /proc/lifecycle/state
Expected:
state=ONLINE
Use reset to move back to BOOTING, then resume after maintenance:
coh> lifecycle reset
coh> cat /proc/lifecycle/state
coh> lifecycle resume
Expected:
state=BOOTING
state=ONLINE
Telemetry ingest remains enabled in DRAINING.
coh> echo '{"new":"segment","mime":"text/plain"}' > /queen/telemetry/dev-1/ctl
coh> echo maintenance-event > /queen/telemetry/dev-1/seg/seg-000001
Writes should return OK and /queen/telemetry/dev-1/latest updates deterministically.
The VM does not expose /gpu/models until the host GPU bridge publishes it.
Run the publish on the host (same machine that can reach the Queen TCP console):
./bin/gpu-bridge-host --publish --tcp-host 127.0.0.1 --tcp-port 31337 --auth-token changeme \
--interval-ms 1000 --registry /home/models/peft_registry
Validate in cohsh (quit SwarmUI first if it is running):
coh> ls /gpu/models
coh> cat /gpu/telemetry/schema.json
Expected: OK LS on /gpu/models and a readable schema file.
This is the non-mock flow that requires /gpu/models to be published.
./bin/coh --host 127.0.0.1 --port 31337 peft import --publish \
--model lejepa-edge-v1 \
--from /home/models/lejepa/adapter \
--job job_0001 \
--export /home/models/lejepa/export \
--registry /home/models/peft_registry
./bin/coh --host 127.0.0.1 --port 31337 peft activate \
--model lejepa-edge-v1 --registry /home/models/peft_registry
Confirm pointer and availability (in cohsh):
coh> ls /gpu/models/available
coh> cat /gpu/models/active
Rollback if needed:
./bin/coh --host 127.0.0.1 --port 31337 peft rollback --registry /home/models/peft_registry
/host/*)Publish host-side providers into the VM (systemd, k8s, docker, nvidia).
./bin/host-sidecar-bridge --tcp-host 127.0.0.1 --tcp-port 31337 --auth-token changeme --watch \
--provider systemd --provider k8s --provider docker --provider nvidia
Validate in cohsh:
coh> ls /host
coh> cat /host/systemd/status
coh> cat /host/nvidia/gpu/0/status
Expected: bounded status lines; state=unknown when a provider is unavailable.
SwarmUI is read-only and must not run concurrently with cohsh.
cohsh, launch SwarmUI:
./bin/swarmui
/worker is empty, approve and spawn a heartbeat first, then re-run ls /worker:
echo {"id":"spawn-1","target":"/queen/ctl","decision":"approve"} > /actions/queuespawn heartbeat ticks=100
./bin/cohsh --transport tcp --tcp-host 127.0.0.1 --tcp-port 31337 --role queen <<'COH'
attach queen
ls /worker
# Replace worker-1 with the actual worker id from the ls output.
echo heartbeat-demo > /worker/worker-1/telemetry
cat /worker/worker-1/telemetry
COH
Use this flow when operating PEFT/LoRA lifecycles and GPU leases with deterministic receipts and evidence.
Dry-run a built-in LLM-focused playbook:
cohesix-playbook --playbook mac-private-peft-grid --dry-run --mock
cohesix-playbook --playbook mixed-closed-loop-ai-factory --dry-run --mock
Expected: reports under out/examples/playbooks/<playbook-id>/ with no live writes.
hive-gateway as the sole console client:
COH_TCP_HOST=127.0.0.1 COH_TCP_PORT=31337 COH_AUTH_TOKEN=changeme \
HIVE_GATEWAY_REQUEST_AUTH_TOKEN=replace-with-real-token \
COH_ROLE=queen HIVE_GATEWAY_BIND=127.0.0.1:8080 \
./bin/hive-gateway
host-ticket-agent through the same gateway:
./bin/host-ticket-agent --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN"
Submit ticket lines to /host/tickets/spec (schema host-ticket/v1) from a REST client:
curl -sS -X POST http://127.0.0.1:8080/v1/fs/echo \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${HIVE_GATEWAY_REQUEST_AUTH_TOKEN}" \
-d '{"path":"/host/tickets/spec","line":"{\"schema\":\"host-ticket/v1\",\"id\":\"llmops-gpu-lease-1\",\"idempotency_key\":\"llmops-run-20260218\",\"action\":\"gpu.lease.grant\",\"args\":{\"gpu_id\":\"GPU-0\",\"mem_mb\":4096,\"streams\":1,\"ttl_s\":600}}"}'
curl -sS -X POST http://127.0.0.1:8080/v1/fs/echo \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${HIVE_GATEWAY_REQUEST_AUTH_TOKEN}" \
-d '{"path":"/host/tickets/spec","line":"{\"schema\":\"host-ticket/v1\",\"id\":\"llmops-peft-activate-1\",\"idempotency_key\":\"llmops-run-20260218\",\"action\":\"peft.activate\",\"args\":{\"model\":\"lejepa-edge-v1\"}}"}'
Notes:
id unique per request.idempotency_key across retries for the same intended operation.gpu.lease.*, peft.*, systemd.*, docker.*, k8s.*).Check deterministic ticket outcomes:
curl -sS 'http://127.0.0.1:8080/v1/fs/cat?path=/host/tickets/status&max_bytes=2048' | jq .
curl -sS 'http://127.0.0.1:8080/v1/fs/cat?path=/host/tickets/deadletter&max_bytes=2048' | jq .
Expected lifecycle states: claimed, running, succeeded (or failed/expired in deadletter).
./bin/coh evidence pack --rest-url http://127.0.0.1:8080 \
--rest-auth-token "$HIVE_GATEWAY_REQUEST_AUTH_TOKEN" \
--out ./out/evidence/llmops --with-telemetry
./bin/coh evidence timeline --in ./out/evidence/llmops
python3 tools/cohesix-py/examples/ci_evidence_pack.py --pack ./out/evidence/llmops \
--out ./out/evidence/llmops/ci_summary.json
python3 tools/cohesix-py/examples/siem_export_ndjson.py --pack ./out/evidence/llmops \
--out ./out/evidence/llmops/siem.ndjson
Gateway overload is explicit (HTTP 429) and should be handled by caller-side pacing.
curl -sS http://127.0.0.1:8080/v1/meta/status | jq .
Inspect broker counters (control_waiters, telemetry_waiters, pool_exhausted, timeout_rejections) before increasing publish rates.
Multi-Hive Federation: Host-side ticket relay between independent hives using existing host-ticket/v1 schemas and namespaces.Source Hive (source_hive): Hive where the ticket intent is authored and initially queued.Target Hive (target_hive): Remote hive selected for relay execution.Relay Hop (relay_hop): Monotonic relay counter (1..=32) attached to federated ticket lines.Relay Correlation ID (relay_correlation_id): Deterministic cross-hive correlation token, usually id:idempotency_key:source_hive:target_hive.Federated Idempotency Key: Dedup identity id + idempotency_key + source_hive + target_hive.ERR ECHO reason=policy ... EPERM: queue an approval in /actions/queue, then retry the control write.ERR AUTH or connection refused: verify QEMU is running and the console port matches 127.0.0.1:31337.cohsh hangs or coh cannot connect: another console client is already attached.hive-gateway is running, bound to the expected address, and is the only console client./gpu empty: run ./bin/gpu-bridge-host --publish ... (live) or --mock --list (mock)./host empty: run ./bin/host-sidecar-bridge --watch --provider ....