cohesix

Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".

View the Project on GitHub lukeb-aidev/cohesix

Cohesix Interfaces (Queen/Worker, NineDoor, GPU Bridge)

The queen/worker verbs and /queen/ctl schema form the hive control API: one Queen instance uses these interfaces to control many workers over the shared Secure9P namespace.

This document is canonical for control-plane interfaces. Snippets marked coh-rtc are generated from configs/root_task.toml and must not be edited by hand. If code diverges from this document, update IR, regenerate artifacts, and then update docs/tests in the same change.

Related docs

At a glance

0. Stability & Versioning

Interface invariants:

Figure 1. Sequence diagram

sequenceDiagram
  autonumber

  participant Operator
  participant Cohsh as cohsh
  participant Console as root-task TCP console
  participant ND as NineDoor
  participant RT as root-task
  participant QCTL as /queen/ctl
  participant WT as /shard/<label>/worker/<id>/telemetry
  participant LOG as /log/queen.log
  participant GPUB as gpu-bridge-host
  participant GPU as /gpu/<id>/*

  %% =========================
  %% Protocol invariants
  %% =========================
  Note over ND: Secure9P only. Version 9P2000.L. Remove disabled. Msize max 8192.
  Note over ND: Paths are UTF-8. No NUL. Max component length 255 bytes.
  Note over QCTL: Append-only control file. One command per line.
  Note over Console: Line protocol. Max line length 256 bytes. ACK before side effects.
  Note over GPU: Provider-backed nodes. info read-only. ctl and job append-only.

  %% =========================
  %% A) TCP console attachment
  %% =========================
  Operator->>Cohsh: run cohsh with TCP transport
  Cohsh->>Console: ATTACH role ticket
  alt ticket and role valid
    Console-->>Cohsh: OK ATTACH
  else invalid or rate-limited
    Console-->>Cohsh: ERR ATTACH
  end

  %% Keepalive
  Cohsh->>Console: PING
  Console-->>Cohsh: PONG

  %% Tail logs over console
  Cohsh->>Console: TAIL path
  Console-->>Cohsh: OK TAIL
  loop log streaming
    Console-->>Cohsh: log line
  end
  Console-->>Cohsh: END

  %% =========================
  %% B) Secure9P session setup
  %% =========================
  Operator->>Cohsh: run cohsh in 9P mode
  Cohsh->>ND: TVERSION msize 8192
  ND-->>Cohsh: RVERSION
  Cohsh->>ND: TATTACH with ticket
  alt ticket valid
    ND-->>Cohsh: RATTACH
  else invalid
    ND-->>Cohsh: Rerror Permission
  end

  %% =========================
  %% C) Queen control via /queen/ctl
  %% =========================
  Cohsh->>ND: TWALK /queen/ctl
  ND-->>Cohsh: RWALK
  Cohsh->>ND: TOPEN /queen/ctl append
  ND-->>Cohsh: ROPEN

  Cohsh->>ND: TWRITE spawn heartbeat worker
  ND->>RT: validate command and permissions
  alt spawn allowed
    RT-->>ND: spawn OK
    ND-->>Cohsh: RWRITE
  else invalid or busy
    RT-->>ND: error
    ND-->>Cohsh: Rerror
  end

  %% =========================
  %% D) Worker telemetry
  %% =========================
  RT->>WT: append heartbeat record
  RT->>WT: append heartbeat record

  %% =========================
  %% E) GPU provider registration
  %% =========================
  GPUB->>ND: connect as Secure9P provider
  ND-->>GPUB: provider session ready
  GPUB->>GPU: publish info
  GPUB->>GPU: publish ctl
  GPUB->>GPU: publish job
  GPUB->>GPU: publish status

  %% =========================
  %% F) GPU lease request
  %% =========================
  Cohsh->>ND: TWRITE spawn gpu lease request
  ND->>RT: validate lease request
  alt provider available
    RT-->>ND: lease queued
    ND-->>Cohsh: RWRITE
    RT->>GPU: append lease to ctl
    RT->>LOG: append lease issued
    GPUB->>GPU: update status QUEUED
    GPUB->>GPU: update status RUNNING
  else provider unavailable
    RT-->>ND: error Busy
    ND-->>Cohsh: Rerror Busy
  end

  %% =========================
  %% G) GPU job execution
  %% =========================
  Cohsh->>ND: TWRITE append job
  ND-->>Cohsh: RWRITE
  GPUB->>GPU: update status OK or ERR
  RT->>WT: append job result

  %% =========================
  %% H) Tail logs via 9P
  %% =========================
  Cohsh->>ND: TWALK /log/queen.log
  ND-->>Cohsh: RWALK
  Cohsh->>ND: TOPEN read
  ND-->>Cohsh: ROPEN
  loop tail polling
    Cohsh->>ND: TREAD offset
    ND-->>Cohsh: RREAD
  end

1. NineDoor 9P Operations

2. Capability Ticket

pub struct Ticket(pub [u8; 32]);

pub struct TicketClaims {
    pub role: Role,
    pub budget: Budget,
    pub subject: Option<String>,
    pub mounts: MountSpec,
    pub issued_at_ms: u64,
}

3. Queen Control Surface

Path: /queen/ctl (append-only JSON lines)

{"spawn":"heartbeat","ticks":100,"budget":{"ttl_s":120,"ops":500}}
{"kill":"worker-7"}
{"bind":{"from":"/shard","to":"/shadow"}}
{"mount":{"service":"gpu-bridge","at":"/gpu"}}
{"spawn":"gpu","lease":{"gpu_id":"GPU-0","mem_mb":4096,"streams":2,"ttl_s":120}}

3a. Node Lifecycle Control

Path: /queen/lifecycle/ctl (append-only, queen-only)

cordon
drain
resume
quiesce
reset

Lifecycle observability (read-only)

3b. GPU Bridge Publish Channel

Path: /gpu/bridge/ctl (append-only, queen-only; lifecycle gate: host_publish)

Publish lines (one per append):

begin bytes=<payload_bytes> sha256=<hex>
b64:<base64_chunk>
...
end

Status path: /gpu/bridge/status (read-only)

3c. Scheduler Control

Path: /queen/schedule/ctl (append-only JSONL)

{"id":"sched-1","role":"worker-gpu","priority":2,"ticks":3,"budget_ms":120}

3d. Lease Control

Path: /queen/lease/ctl (append-only JSONL)

{"op":"grant","id":"lease-1","subject":"queen","resource":"gpu0","ttl_s":300,"priority":5}
{"op":"renew","id":"lease-1","ttl_s":600,"priority":6}
{"op":"preempt","id":"lease-1","reason":"timeout"}
{"op":"quota","subject":"queen","resource":"gpu0","max_active":4,"max_preemptions":8}

3e. Export Control

Path: /queen/export/ctl (append-only JSONL)

{"op":"open","id":"export-1","ttl_s":900}
{"op":"close","id":"export-1","reason":"window-complete"}

4. Worker Telemetry

Queen telemetry ingest (host push)

Queen LoRA export (read-only)

Telemetry ingest envelope (cohsh-telemetry-push/v1)

cohsh telemetry push emits UTF-8 JSON lines, one per append:

{"schema":"cohsh-telemetry-push/v1","seq":1,"mime":"text/plain","payload":"telemetry demo line 1"}

| Field | Type | Required | Description | | — | — | — | — | | schema | text | yes | Schema identifier; must be cohsh-telemetry-push/v1. | | seq | uint | yes | Monotonic per-segment sequence number (starts at 1). | | mime | text | yes | MIME type of the source payload (e.g. text/plain). | | payload | text | yes | Opaque UTF-8 payload chunk; cohsh chunks to stay within max_record_bytes (4096). |

Telemetry reference-manifest envelope (coh-ref-c/v1)

For large host artifacts, cohsh telemetry push and the Python SDK emit reference-manifest lines instead of inline payload transfer:

{"schema":"coh-ref-c/v1","seq":1,"off":0,"len":16777216,"sha256":"QmFzZTY0RGlnZXN0Li4u"}

| Field | Type | Required | Description | | — | — | — | — | | schema | text | yes | Schema identifier; must be coh-ref-c/v1. | | seq | uint | yes | Monotonic record sequence (starts at 1). | | off | uint | yes | Referenced byte offset. Must be contiguous (off == prior off + len). | | len | uint | yes | Referenced chunk bytes (>= 1). | | sha256 | text | yes | Chunk digest token (bounded ASCII digest alphabet). |

Deterministic ingest rules:

Sharded worker namespace (generated)

Generated from configs/root_task.toml (sha256: afc015e7a9f9bea1625f43a291c485760b380eebedb622af15ebcc40f6ba2fc9).

Telemetry CBOR Frame v1 (generated)

Field CBOR type Required Description
schema text yes Schema identifier; must be telemetry-frame/v1.
worker_id text yes Worker identifier emitting the record.
role text yes Worker role label (worker-heartbeat, worker-gpu).
seq uint yes Monotonic frame sequence number.
emitted_ms uint yes Unix epoch milliseconds captured by the worker.
payload map yes Schema-specific payload map (e.g., heartbeat or GPU job data).

Generated by coh-rtc (sha256: d1906bce668a4d73d95a8262734f1ec04a1480610ebfd9b6c3f3c8ad2e402b7e).

5. Sidecar Bus & LoRa Mounts

Sidecar namespaces are manifest-gated; mounts appear only when sidecars.*.enable = true and adapter labels are compiler-resolved (hash-prefixed on collision).

/bus/<adapter> (MODBUS/DNP3)

/lora/<adapter>

6. /proc Observability

/proc observability nodes (generated)

Generated by coh-rtc (sha256: 4ff0d485329b917eeaa1b604f8adfb28fd0a75924e7d55ac818d9359b81379b5).

6a. NineDoor UI Providers (Read-Only)

/proc UI summaries (text + CBOR)

Policy preflight (text + CBOR)

Update status (text + CBOR)

6b. SwarmUI Consumption (Host UI)

Live Hive (render-only PixiJS view)

7. GPU Bridge Files (host-mirrored)

| Path | Mode | Description | |——|——|————-| | /gpu/bridge/ctl | append-only | GPU bridge snapshot publish channel (begin/b64:/end). | | /gpu/bridge/status | read-only | Publish status (state=idle|receiving|ok|err). | | /gpu/<id>/info | read-only | JSON metadata: vendor, model, memory, SMs, driver/runtime versions | | /gpu/<id>/ctl | append-only | Lease management: LEASE, RELEASE, PRIORITY <n> | | /gpu/<id>/lease | append-only | Lease/ticket log entries (gpu-lease/v1) with active/release state | | /gpu/<id>/job | append-only | JSON job descriptors (validated hash, grid/block dims, optional payload_b64) | | /gpu/<id>/status | read-only append stream | Job lifecycle entries (QUEUED/RUNNING/OK/ERR) | | /gpu/models/available/<model_id>/manifest.toml | read-only | Host-authored model manifests; no uploads from the VM | | /gpu/models/active | append-only pointer | Symlink-like pointer to the active model (atomic swap on host) | | /gpu/telemetry/schema.json | read-only | Versioned schema descriptor (gpu-telemetry/v1) with field and size limits | | /gpu/telemetry/* | host-only | Telemetry records remain host-side; only the schema is mirrored into the VM. |

GPU status breadcrumb schema (generated)

Generated by coh-rtc (sha256: 80eff6277e0b97c54fc8996ffc01a54ccff20b899bcd0e9f63c30de1afb02f80).

8. Host Sidecar Files (/host)

| Path | Mode | Description | |——|——|————-| | /host/systemd/<unit>/status | append-only | Host-published unit status snapshots (mock or live) | | /host/systemd/<unit>/start | append-only | Control sink for start requests (queen-only) | | /host/systemd/<unit>/stop | append-only | Control sink for stop requests (queen-only) | | /host/systemd/<unit>/restart | append-only | Control sink for restart requests (queen-only) | | /host/k8s/node/<name>/cordon | append-only | Control sink for cordon requests (queen-only) | | /host/k8s/node/<name>/drain | append-only | Control sink for drain requests (queen-only) | | /host/docker/status | append-only | Host-published Docker status snapshot (mock or live) | | /host/docker/restart | append-only | Control sink for restart requests (queen-only) | | /host/docker/stop | append-only | Control sink for stop requests (queen-only) | | /host/nvidia/gpu/<id>/status | append-only | Host-published GPU status snapshots (mock or live) | | /host/nvidia/gpu/<id>/power_cap | append-only | Control sink for power-cap changes (queen-only) | | /host/nvidia/gpu/<id>/thermal | append-only | Host-published thermal snapshots (mock or live) | | /host/tickets/spec | append-only JSONL | Host control ticket requests (host-ticket/v1) | | /host/tickets/status | append-only JSONL | Host control ticket lifecycle receipts (host-ticket-result/v1) | | /host/tickets/deadletter | append-only JSONL | Terminal failure/expiry receipts (host-ticket-result/v1) | | /host/tickets/spec.snapshot | read-only | Bounded snapshot view of /host/tickets/spec | | /host/tickets/status.snapshot | read-only | Bounded snapshot view of /host/tickets/status | | /host/tickets/deadletter.snapshot | read-only | Bounded snapshot view of /host/tickets/deadletter |

Line formats (append-only snapshots; values are sanitized and lines capped at 256 bytes):

9. CAS Updates & Models

CAS update surfaces (generated)

Generated by coh-rtc (sha256: 1bd13b5ce9da8c2e5442e87cfca3e95daa90ee3fbba7de30e21855f19a3ae8a5).

10. PolicyFS & Actions (/policy, /actions)

| Path | Mode | Description | |——|——|————-| | /policy/ctl | append-only | Policy control JSONL commands (validated UTF-8, manifest-bounded) | | /policy/rules | read-only | Manifest-derived policy rules snapshot | | /actions/queue | append-only | JSONL approvals/denials (id, target, decision) | | /actions/<id>/status | read-only | Status snapshot (queuedconsumed) |

Policy control (/policy/ctl) JSONL:

{"op":"apply","id":"rev-2026-02-03","sha256":"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"}
{"op":"rollback","id":"rev-2026-02-03"}

11. AuditFS & ReplayFS (/audit, /replay)

| Path | Mode | Description | |——|——|————-| | /audit/journal | append-only | JSONL audit journal of Cohesix control actions (bounded by manifest) | | /audit/decisions | append-only | Policy approvals/denials (policy-action, policy-gate) with role/ticket metadata | | /audit/export | read-only | Snapshot of retention bounds (journal_base, journal_next, decisions_base, decisions_next) plus replay flags | | /replay/ctl | append-only | Replay command JSON ({"from":<cursor>}) | | /replay/status | read-only | Replay status (idle/ok/err) with deterministic sequence_fnv1a |

12. Root Task RPC (internal trait)

pub trait RootTaskControl {
    fn spawn(&self, role: Role, spec: WorkerSpec) -> Result<WorkerId, SpawnError>;
    fn kill(&self, id: WorkerId) -> Result<(), KillError>;
    fn bind(&self, session: SessionId, from: &str, to: &str) -> Result<(), NamespaceError>;
    fn mount(&self, session: SessionId, service: &str, at: &str) -> Result<(), NamespaceError>;
}

13. CLI (cohsh) Protocol

14. Error Surface

| Error | Meaning | |——-|———| | Permission | Role not permitted to access path or mode | | NotFound | Path or worker ID missing | | Busy | Resource in use (GPU lease, worker slot) | | Invalid | JSON parse failure or malformed 9P frame | | TooBig | Frame exceeds negotiated msize | | Closed | Fid used after clunk or revoked ticket | | RateLimited | Console authentication locked out due to repeated failures |

15. Documentation Hooks