Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".
Author: Lukas Bower — October 15, 2025 Revision: February 14, 2026
This document enumerates concrete, high-value use cases for Cohesix across sectors. It preserves technical specifics while adding business context so stakeholders can quickly assess fit, risk, and required integrations.
Cohesix is a control-plane operating system for secure orchestration and telemetry of edge GPU nodes. It exposes a Secure9P file namespace as the only control surface and keeps heavy ecosystems (Kubernetes, CUDA/NVML, OT protocols, model registries) on the host and outside the VM’s trusted computing base. Milestone 25c now adds a world-class Python orchestration surface with typed control APIs, host-provider adapters, and ready-to-run playbooks for Mac, Jetson, and mixed 1k-worker fleets. For business stakeholders, this means smaller audit scope, safer multi-tenant GPU sharing, and faster integration into existing Python tooling.
Cohesix is:
Cohesix is not:
A Cohesix hive runs inside an seL4 VM on aarch64. The Queen (root-task + NineDoor) exposes /queen, /proc, /log, and sharded worker telemetry under /shard/<label>/worker/<id>/telemetry, with optional /gpu, /host, /policy, /audit, /replay, /updates, and /models namespaces when enabled by the manifest. Workers run as separate roles with bounded budgets. External ecosystems live on the host; host-side bridges publish /host/* and /gpu/* views into the namespace. QEMU aarch64/virt is the reference dev/CI environment, and UEFI ARM64 hardware is the target deployment profile.
The Python SDK (tools/cohesix-py) is now a first-class operator path for high-scale automation while preserving Cohesix protocol boundaries.
Control APIs (typed, bounded, non-authoritative):
CohesixOrchestrator provides typed requests for:
/actions/queue)/queen/schedule/ctl)/queen/lease/ctl)/queen/export/ctl)/proc snapshot helpers provide bounded reads for scheduler and lease observability.Integration adapters (host-side only):
systemd service state probes.kubectl fallback).pynvml first, nvidia-smi fallback).torch, transformers, peft, accelerate, bitsandbytes).Playbook UX (frictionless integration):
cohesix-playbook --list returns a deterministic catalog.cohesix-playbook --playbook <id> --dry-run --mock validates plans with no control writes.out/examples/playbooks/<playbook-id>/.Built-in world-class playbooks (1k-worker oriented):
mac-release-factorymac-private-peft-gridmac-endpoint-compliancejetson-traffic-safetyjetson-manufacturing-safetyjetson-critical-inframixed-closed-loop-ai-factorymixed-medical-edge-aimixed-logistics-digital-twin| Fleet type | Business program | Python playbook id |
| — | — | — |
| Mac | Global app release factory | mac-release-factory |
| Mac | Private PEFT/LoRA grid | mac-private-peft-grid |
| Mac | Endpoint compliance orchestration | mac-endpoint-compliance |
| Jetson | Traffic safety mesh | jetson-traffic-safety |
| Jetson | Manufacturing safety + QA | jetson-manufacturing-safety |
| Jetson | Critical infrastructure sensing | jetson-critical-infra |
| Mixed | Closed-loop edge AI factory | mixed-closed-loop-ai-factory |
| Mixed | Medical edge AI governance | mixed-medical-edge-ai |
| Mixed | Logistics digital twin operations | mixed-logistics-digital-twin |
Business outcome: Segmented OT control with auditable change authority and minimal downtime. Why Cohesix: seL4 isolation, file-shaped control plane, deterministic telemetry and logs. Integration: MODBUS/CAN sidecars; host-side uplink for telemetry export. Constraints: deterministic timing, safety certification paths.
Business outcome: Hardened OT/IT boundary with predictable behavior during incidents. Why Cohesix: deterministic scheduling, minimal attack surface, append-only audit logs. Integration: DNP3/IEC-104 adapters; signed config bundles; GPS/PTP time sources. Constraints: NERC/CIP, IEC 61850, local change control.
Business outcome: Privacy-respecting analytics and faster model rollouts with lower WAN cost.
Why Cohesix: host-side GPU stack, model pointers via /gpu/models/active, bounded telemetry.
Integration: content-addressed model updates; local summarization; schema-tagged telemetry.
Constraints: PII handling, retention windows.
Business outcome: Reliable telemetry and updates across harsh RF and intermittent links. Why Cohesix: offline-first logs, replayable state, strict capability scoping. Integration: durable disk spooling; batch upload sidecar. Constraints: physical security, RF noise.
Business outcome: Multi-tenant accelerator governance at cell sites with clear SLAs. Why Cohesix: ticketed leases, sharded namespaces, deterministic resource budgets. Integration: SR-IOV/NIC telemetry sidecars; per-tenant quota policies. Constraints: carrier-grade ops, slice isolation.
Business outcome: Minimal PHI footprint with traceable access and transfers. Why Cohesix: append-only audit, policy gates, deterministic telemetry. Integration: DICOM proxy; de-identification pipeline; export gating. Constraints: HIPAA, ISO 27001, locality requirements.
Business outcome: Safe update windows and fleet learning without constant connectivity. Why Cohesix: content-addressed updates for deterministic version pinning and bounded telemetry envelopes. Integration: delta packs; multicast to many vehicles; PEFT-ready telemetry export. Constraints: safety certification, predictable maintenance windows.
Business outcome: Trusted control under low bandwidth with tamper-evident logs. Why Cohesix: seL4 assurance, minimal TCB, file-scoped authority. Integration: LoRa or SATCOM schedulers; rapid key-rotation workflows. Constraints: export controls, contested networks.
Business outcome: Scalable governance of large sensor fleets with low operational cost. Why Cohesix: small footprint gateway with append-only telemetry. Integration: sensor-bus sidecars (I2C/SPI); coarse local summarization. Constraints: public data governance, OTA safety.
Business outcome: Signed content delivery with proof-of-display and SLA reporting. Why Cohesix: content-addressed assets, policy gating, immutable audit trails. Integration: schedule provider; receipts pipeline; bandwidth-aware staging. Constraints: bandwidth caps, SLA reporting.
Business outcome: Auditable control over high-value signing operations. Why Cohesix: policy-as-files, role-scoped tickets, append-only logs. Integration: sign/verify provider; rate and role caps. Constraints: FIPS modes, key custody.
Business outcome: Replace VPN sprawl with time-boxed, least-privilege access. Why Cohesix: tiny boundary device, tickets/leases, deterministic audit logs. Integration: dual-NIC profile; AccessPolicy compiler; telemetry rings. Constraints: audits, change control.
Business outcome: Store-and-forward data collection under power and link limits. Why Cohesix: deterministic envelopes, append-only queues, replayable state. Integration: delay-tolerant queues; trickle updates; clock beacons. Constraints: power budget, severe weather.
Business outcome: Predictable control-plane operations under long RTT. Why Cohesix: low-memory deterministic control processes. Integration: CCSDS/TCP bridge; high-latency backpressure tuning. Constraints: link budgets, long RTT.
Business outcome: Demonstrable, auditable update lifecycle and rollback readiness for stakeholders. Why Cohesix: content-addressed updates, policy gating, audit logs. Integration: golden-image verifier; host updater for rollbacks; CLI scripts; dashboards. Constraints: demo reproducibility, change control.
Business outcome: Teachable microkernel and secure control-plane workflows. Why Cohesix: small, readable userland; file-shaped APIs for labs. Integration: mock transports; fuzz harnesses; trace viewer. Constraints: safe sandboxing, repeatable fixtures.
Business outcome: Signed, reviewable policy changes with diffable drift. Why Cohesix: policy namespaces, audit trails, deterministic control. Integration: policy bundle pipeline; diff views; approval workflow. Constraints: segregation of duties, audit trails.
Business outcome: Time-boxed vendor access with complete traceability.
Why Cohesix: scoped tickets, lease files, append-only session logs.
Integration: maintenance window leases; per-path AccessPolicy; /log session recording.
Constraints: compliance audits, least-privilege, offline fallback.
/updates)Business outcome: Provenance-preserving updates without WAN connectivity.
Why Cohesix: content-addressed bundles under /updates, deterministic verification, audit trails.
Integration: host-side cas-tool ingestion from removable media; resumable chunk validation.
Constraints: strict provenance, operational simplicity.
Business outcome: Fair, auditable sharing of accelerators across tenants.
Why Cohesix: file-modeled leases, ticketed requests, host-enforced policy.
Integration: quota accounting; eviction/renew flows; gpu-bridge-host governance rules.
Constraints: noisy-neighbor control, operator clarity.
Business outcome: Controlled model rollout with auditable provenance.
Why Cohesix: content-addressed models under /models and /gpu/models, policy gating, /proc/boot provenance.
Integration: model registry sidecar; signature verification; LoRA lineage tracking.
Constraints: regulated AI, privacy boundaries.
Business outcome: Maintain telemetry and remote control even if host OS degrades. Why Cohesix: minimal boundary, read-only recovery namespace, immutable logs. Integration: rescue worker profile; out-of-band operator attach flow. Constraints: incident response procedures, tamper evidence.
Business outcome: Blame-free postmortems with deterministic replay. Why Cohesix: append-only rings, bounded scheduling, file-based replay. Integration: export pipeline sidecar; compression outside the TCB. Constraints: safety certification, retention limits.
Business outcome: Outbound-only telemetry posture with minimal inbound surface. Why Cohesix: policy-enforced file verbs, append-only exports. Integration: export-only providers; batching/backpressure tuning. Constraints: regulated environments, packet loss tolerance.
Business outcome: Governance layer for lifecycle, telemetry, and GPU leasing without replacing Kubernetes.
Why Cohesix: file APIs for control-plane actions; host-side bridge maps K8s to /queen and /shard.
Integration: identity mapping; RBAC-to-ticket translation.
Constraints: clear separation of responsibilities.
Business outcome: Safe performance feedback for centralized training. Why Cohesix: schema-tagged, bounded telemetry; model lifecycle pointers. Integration: export namespaces for training farms; privacy filters. Constraints: no gradients or raw data in the VM; deterministic bandwidth/storage envelopes.
Figure 1 Edge hive deployment (Smart factory / Retail CV hub / MEC node)
flowchart LR
subgraph SITE["Edge Site (Factory / Store / MEC)"]
subgraph HIVE["Cohesix Hive (one Queen, many Workers)"]
Q["Queen<br/>(root-task + NineDoor)<br/>/queen /proc /log"]:::queen
W1["Worker: sensors/PLC"]:::worker
W2["Worker: CV camera ingest"]:::worker
W3["Worker: app control loop"]:::worker
WG["Worker: gpu stub<br/>(in-VM, no CUDA)"]:::worker
end
subgraph HOST["Host ecosystems (sidecars)"]
OT["OT protocol bridge<br/>(MODBUS/CAN/DNP3/IEC-104)"]:::sidecar
GPU["gpu-bridge-host<br/>CUDA/NVML stays here"]:::sidecar
STORE["Local storage / spool<br/>(ring buffers, batch upload)"]:::sidecar
end
CAM["Cameras / Sensors"]:::ext
PLC["PLCs / Robots"]:::ext
JET["Jetson / Edge GPU nodes"]:::ext
end
CLOUD["Cloud / HQ<br/>(Ops + Registry + Analytics)"]:::cloud
OPS["Operator / NOC<br/>cohsh or GUI client"]:::ext
%% flows
OPS -->|"cohsh attach<br/>(console or Secure9P)"| Q
CAM -->|"telemetry/video"| W2
PLC -->|"fieldbus"| OT
OT -->|"mirrored files<br/>into namespace"| Q
W1 -->|"append telemetry<br/>/shard/<label>/worker/<id>/telemetry"| Q
W2 -->|"append summaries<br/>/shard/<label>/worker/<id>/telemetry"| Q
W3 -->|"control + status"| Q
Q -->|"ticketed orchestration<br/>/queen/ctl"| W1
Q -->|"ticketed orchestration<br/>/queen/ctl"| W2
Q -->|"ticketed orchestration<br/>/queen/ctl"| W3
Q -->|"lease + job via /gpu/*"| WG
WG -->|"append job descriptors<br/>/gpu/<id>/job"| Q
GPU -->|"publishes provider nodes<br/>/gpu/<id>/*"| Q
JET -->|"CUDA workloads<br/>host-side"| GPU
Q -->|"append-only logs<br/>/log/*"| Q
Q -->|"batch export / uplink<br/>(protocol outside TCB)"| STORE
STORE -->|"durable batch upload"| CLOUD
classDef queen fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
classDef worker fill:#f0fdf4,stroke:#15803d,stroke-width:1px;
classDef sidecar fill:#fff7ed,stroke:#c2410c,stroke-width:1px;
classDef cloud fill:#eef2ff,stroke:#4338ca,stroke-width:1px;
classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;
Figure 2: Vendor remote maintenance without VPN sprawl (tickets + leases + append logs)
sequenceDiagram
autonumber
participant Vendor as Vendor Engineer
participant Cohsh as cohsh
participant ND as NineDoor
participant POL as AccessPolicy
participant RT as root-task
participant MW as maintenance window
participant DEV as worker ctl
participant SLOG as session log
Note over ND: File ops only. Policy runs before provider logic. Logs are append-only.
Vendor->>Cohsh: obtain scoped ticket
Vendor->>Cohsh: attach vendor role with ticket
Cohsh->>ND: TATTACH ticket
ND->>POL: evaluate ticket scope TTL and rate limits
POL-->>ND: allow or deny
alt maintenance window active
Cohsh->>ND: TOPEN MW read
ND-->>Cohsh: ROPEN
Cohsh->>ND: TREAD MW confirm active
ND-->>Cohsh: RREAD active
Cohsh->>ND: TOPEN DEV append
ND-->>Cohsh: ROPEN
Cohsh->>ND: TWRITE cmd diagnose level basic
ND->>POL: check path and verb allowed
POL-->>ND: allow
ND->>RT: perform validated internal action
RT-->>ND: ok
ND-->>Cohsh: RWRITE
Cohsh->>ND: TOPEN SLOG append
ND-->>Cohsh: ROPEN
Cohsh->>ND: TWRITE audit vendor action diagnose target worker
ND-->>Cohsh: RWRITE
else window inactive or expired
Cohsh->>ND: TOPEN MW read
ND-->>Cohsh: ROPEN
Cohsh->>ND: TREAD MW
ND-->>Cohsh: RREAD inactive
Cohsh->>ND: TWRITE cmd diagnose
ND-->>Cohsh: Rerror Permission
end
Figure 3: Air-gapped update ferry (removable media + /updates + audit)
flowchart LR
USB["Portable media<br/>(update bundles)"]:::ext
subgraph HIVE["Air-gapped site: Cohesix Hive"]
Q["Queen<br/>(root-task + NineDoor)"]:::queen
UPD["/updates/<epoch>/*<br/>(manifest + chunks)"]:::path
LOG["/log/*<br/>append-only audit"]:::path
end
OPS["Operator<br/>cohsh"]:::ext
HOST["Host cas-tool"]:::sidecar
USB -->|"ingest bundle"| HOST
HOST -->|"write manifest + chunks"| UPD
OPS -->|"inspect status"| UPD
Q -->|"audit writes"| LOG
classDef queen fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
classDef path fill:#f8fafc,stroke:#334155,stroke-dasharray: 4 3;
classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;
classDef sidecar fill:#fff7ed,stroke:#c2410c,stroke-width:1px;
Figure 4: GPU lease broker for multi-tenant edge (CUDA stays on host)
sequenceDiagram
autonumber
participant Tenant as Tenant App
participant ND as NineDoor
participant RT as root-task
participant GPU as gpu files
participant GPUB as gpu-bridge-host
Note over GPUB: CUDA and NVML stay on host. Enforcement happens here.
Tenant->>ND: TATTACH tenant ticket
Tenant->>ND: TWALK queen ctl
ND-->>Tenant: RWALK
Tenant->>ND: TOPEN queen ctl append
ND-->>Tenant: ROPEN
Tenant->>ND: TWRITE spawn gpu lease request
ND->>RT: validate ticket scope and quotas
alt capacity available
RT-->>ND: ok queued
ND-->>Tenant: RWRITE
RT->>GPU: append ctl LEASE issued
GPUB->>GPU: append status QUEUED
GPUB->>GPU: append status RUNNING
else no capacity
RT-->>ND: Err Busy
ND-->>Tenant: Rerror Busy
end
Tenant->>ND: TOPEN gpu job append
ND-->>Tenant: ROPEN
Tenant->>ND: TWRITE append job descriptor
ND-->>Tenant: RWRITE
GPUB->>GPU: append status OK or ERR
Figure 5: Model governance and provenance at the edge (attested models)
flowchart LR
REG["Model registry bridge (host sidecar)<br/>CAS + signatures"]:::sidecar
subgraph HIVE["Cohesix Hive"]
Q["Queen<br/>(root-task + NineDoor)"]:::queen
POL["/policy/*<br/>(only signed by X)<br/>allowlist/denylist"]:::path
MODELS["/models/*<br/>(content addressed)"]:::path
DEP["/gpu/models/active<br/>(pointer to model id)"]:::path
BOOT["/proc/boot<br/>(provenance, measurements)"]:::path
LOG["/log/*<br/>append-only audit"]:::path
W["Workers consume model ref<br/>(no unsigned blobs)"]:::worker
end
OPS["Operator / CI<br/>cohsh"]:::ext
REG -->|"publish signed model"| MODELS
OPS -->|"update policy bundle"| POL
OPS -->|"set active model"| DEP
DEP -->|"validated by policy"| Q
Q -->|"audit writes"| LOG
Q -->|"expose boot + model provenance"| BOOT
W -->|"fetch by id<br/>verify via policy"| MODELS
classDef queen fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
classDef worker fill:#f0fdf4,stroke:#15803d,stroke-width:1px;
classDef sidecar fill:#fff7ed,stroke:#c2410c,stroke-width:1px;
classDef path fill:#f8fafc,stroke:#334155,stroke-dasharray: 4 3;
classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;
Problem: Operators need one auditable mechanism to coordinate GPU lease/model actions and host remediation without introducing sideband RPC channels.
Cohesix flow:
/host/tickets/spec (host-ticket/v1).host-ticket-agent executes allowlisted adapters:
gpu.lease.* via existing /queen/ctl and /queen/lease/ctl semantics.peft.* via existing host registry + /gpu/models/*.systemd.*, docker.*, k8s.* as host-side coexistence actions.claimed, running, succeeded, failed, expired) to /host/tickets/status or /host/tickets/deadletter.id + idempotency_keyid + idempotency_key + source_hive + target_hive
across request/outcome/audit/lease artifacts.Why this is distinctive:
Problem: A single hive has practical reliability limits around high worker counts; operators need to orchestrate many hives without introducing active/active split-brain writes.
Cohesix flow:
host-ticket-agent --relay) to forward allowlisted intents between hives via existing REST mutation paths.coh fleet status|lease-summary|pressure for read-only fan-in visibility across hives.--relay-pause-cmd, --relay-resume-cmd) to freeze relay during cutover and resume after health checks.Why this is distinctive:
As-built primitives (current releases):
/shard/<label>/worker/<id>/telemetry./updates/* and model registry exposure under /models/* and /gpu/models/* (when enabled)./policy, /actions, /audit, /replay) with append-only logs./host/* and /gpu/* for ecosystem coexistence.Typical integrations (environment-specific):
cohsh or the shared client library.