cohesix

Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".

View the Project on GitHub lukeb-aidev/cohesix

USE_CASES.md

Author: Lukas Bower — October 15, 2025 Revision: February 14, 2026

Purpose

This document enumerates concrete, high-value use cases for Cohesix across sectors. It preserves technical specifics while adding business context so stakeholders can quickly assess fit, risk, and required integrations.

Executive Summary

Cohesix is a control-plane operating system for secure orchestration and telemetry of edge GPU nodes. It exposes a Secure9P file namespace as the only control surface and keeps heavy ecosystems (Kubernetes, CUDA/NVML, OT protocols, model registries) on the host and outside the VM’s trusted computing base. Milestone 25c now adds a world-class Python orchestration surface with typed control APIs, host-provider adapters, and ready-to-run playbooks for Mac, Jetson, and mixed 1k-worker fleets. For business stakeholders, this means smaller audit scope, safer multi-tenant GPU sharing, and faster integration into existing Python tooling.

Positioning (Business + Technical)

Cohesix is:

Cohesix is not:

Business Outcomes

Operating Model (As-Built)

A Cohesix hive runs inside an seL4 VM on aarch64. The Queen (root-task + NineDoor) exposes /queen, /proc, /log, and sharded worker telemetry under /shard/<label>/worker/<id>/telemetry, with optional /gpu, /host, /policy, /audit, /replay, /updates, and /models namespaces when enabled by the manifest. Workers run as separate roles with bounded budgets. External ecosystems live on the host; host-side bridges publish /host/* and /gpu/* views into the namespace. QEMU aarch64/virt is the reference dev/CI environment, and UEFI ARM64 hardware is the target deployment profile.

Python Orchestration Surface (As-Built, Milestone 25c)

The Python SDK (tools/cohesix-py) is now a first-class operator path for high-scale automation while preserving Cohesix protocol boundaries.

Control APIs (typed, bounded, non-authoritative):

Integration adapters (host-side only):

Playbook UX (frictionless integration):

Built-in world-class playbooks (1k-worker oriented):

Strategic Fit Patterns


Use Case Catalog

25c Playbook Mapping (1k+ Worker Readiness)

| Fleet type | Business program | Python playbook id | | — | — | — | | Mac | Global app release factory | mac-release-factory | | Mac | Private PEFT/LoRA grid | mac-private-peft-grid | | Mac | Endpoint compliance orchestration | mac-endpoint-compliance | | Jetson | Traffic safety mesh | jetson-traffic-safety | | Jetson | Manufacturing safety + QA | jetson-manufacturing-safety | | Jetson | Critical infrastructure sensing | jetson-critical-infra | | Mixed | Closed-loop edge AI factory | mixed-closed-loop-ai-factory | | Mixed | Medical edge AI governance | mixed-medical-edge-ai | | Mixed | Logistics digital twin operations | mixed-logistics-digital-twin |

Edge and Industrial

1) Smart-factory / Industrial IoT gateway

Business outcome: Segmented OT control with auditable change authority and minimal downtime. Why Cohesix: seL4 isolation, file-shaped control plane, deterministic telemetry and logs. Integration: MODBUS/CAN sidecars; host-side uplink for telemetry export. Constraints: deterministic timing, safety certification paths.

2) Energy substation / Micro-grid controller

Business outcome: Hardened OT/IT boundary with predictable behavior during incidents. Why Cohesix: deterministic scheduling, minimal attack surface, append-only audit logs. Integration: DNP3/IEC-104 adapters; signed config bundles; GPS/PTP time sources. Constraints: NERC/CIP, IEC 61850, local change control.

3) Retail / Computer-vision hub (store analytics)

Business outcome: Privacy-respecting analytics and faster model rollouts with lower WAN cost. Why Cohesix: host-side GPU stack, model pointers via /gpu/models/active, bounded telemetry. Integration: content-addressed model updates; local summarization; schema-tagged telemetry. Constraints: PII handling, retention windows.

4) Logistics and ports (ALPR, container ID, crane safety)

Business outcome: Reliable telemetry and updates across harsh RF and intermittent links. Why Cohesix: offline-first logs, replayable state, strict capability scoping. Integration: durable disk spooling; batch upload sidecar. Constraints: physical security, RF noise.

5) Telco MEC micro-orchestrator

Business outcome: Multi-tenant accelerator governance at cell sites with clear SLAs. Why Cohesix: ticketed leases, sharded namespaces, deterministic resource budgets. Integration: SR-IOV/NIC telemetry sidecars; per-tenant quota policies. Constraints: carrier-grade ops, slice isolation.

6) Healthcare imaging edge to cloud PACS

Business outcome: Minimal PHI footprint with traceable access and transfers. Why Cohesix: append-only audit, policy gates, deterministic telemetry. Integration: DICOM proxy; de-identification pipeline; export gating. Constraints: HIPAA, ISO 27001, locality requirements.

7) Autonomous depots (AV/AGV fleets)

Business outcome: Safe update windows and fleet learning without constant connectivity. Why Cohesix: content-addressed updates for deterministic version pinning and bounded telemetry envelopes. Integration: delta packs; multicast to many vehicles; PEFT-ready telemetry export. Constraints: safety certification, predictable maintenance windows.

8) Defense ISR kits / forward ops

Business outcome: Trusted control under low bandwidth with tamper-evident logs. Why Cohesix: seL4 assurance, minimal TCB, file-scoped authority. Integration: LoRa or SATCOM schedulers; rapid key-rotation workflows. Constraints: export controls, contested networks.

9) Smart-city sensing (air/noise/traffic)

Business outcome: Scalable governance of large sensor fleets with low operational cost. Why Cohesix: small footprint gateway with append-only telemetry. Integration: sensor-bus sidecars (I2C/SPI); coarse local summarization. Constraints: public data governance, OTA safety.

10) Broadcast/DOOH signage controller

Business outcome: Signed content delivery with proof-of-display and SLA reporting. Why Cohesix: content-addressed assets, policy gating, immutable audit trails. Integration: schedule provider; receipts pipeline; bandwidth-aware staging. Constraints: bandwidth caps, SLA reporting.


Security and Fintech

11) HSM-adjacent signing gateway

Business outcome: Auditable control over high-value signing operations. Why Cohesix: policy-as-files, role-scoped tickets, append-only logs. Integration: sign/verify provider; rate and role caps. Constraints: FIPS modes, key custody.

12) OT/IT segmentation appliance

Business outcome: Replace VPN sprawl with time-boxed, least-privilege access. Why Cohesix: tiny boundary device, tickets/leases, deterministic audit logs. Integration: dual-NIC profile; AccessPolicy compiler; telemetry rings. Constraints: audits, change control.


Science and Remote Ops

13) Environmental science stations (polar/offshore)

Business outcome: Store-and-forward data collection under power and link limits. Why Cohesix: deterministic envelopes, append-only queues, replayable state. Integration: delay-tolerant queues; trickle updates; clock beacons. Constraints: power budget, severe weather.

14) HAPS/satellite ground gateway

Business outcome: Predictable control-plane operations under long RTT. Why Cohesix: low-memory deterministic control processes. Integration: CCSDS/TCP bridge; high-latency backpressure tuning. Constraints: link budgets, long RTT.


Developer and Platform Tooling

15) Secure OTA lab appliance

Business outcome: Demonstrable, auditable update lifecycle and rollback readiness for stakeholders. Why Cohesix: content-addressed updates, policy gating, audit logs. Integration: golden-image verifier; host updater for rollbacks; CLI scripts; dashboards. Constraints: demo reproducibility, change control.

16) Classroom OS/security labs

Business outcome: Teachable microkernel and secure control-plane workflows. Why Cohesix: small, readable userland; file-shaped APIs for labs. Integration: mock transports; fuzz harnesses; trace viewer. Constraints: safe sandboxing, repeatable fixtures.


Control-plane and Operations

17) Fleet policy GitOps boundary (policy-as-files)

Business outcome: Signed, reviewable policy changes with diffable drift. Why Cohesix: policy namespaces, audit trails, deterministic control. Integration: policy bundle pipeline; diff views; approval workflow. Constraints: segregation of duties, audit trails.

18) Vendor remote maintenance without VPN sprawl

Business outcome: Time-boxed vendor access with complete traceability. Why Cohesix: scoped tickets, lease files, append-only session logs. Integration: maintenance window leases; per-path AccessPolicy; /log session recording. Constraints: compliance audits, least-privilege, offline fallback.

19) Air-gapped update ferry (removable media + /updates)

Business outcome: Provenance-preserving updates without WAN connectivity. Why Cohesix: content-addressed bundles under /updates, deterministic verification, audit trails. Integration: host-side cas-tool ingestion from removable media; resumable chunk validation. Constraints: strict provenance, operational simplicity.

20) GPU lease broker for multi-tenant edge (host CUDA intact)

Business outcome: Fair, auditable sharing of accelerators across tenants. Why Cohesix: file-modeled leases, ticketed requests, host-enforced policy. Integration: quota accounting; eviction/renew flows; gpu-bridge-host governance rules. Constraints: noisy-neighbor control, operator clarity.

21) Model governance and provenance at the edge (attested models)

Business outcome: Controlled model rollout with auditable provenance. Why Cohesix: content-addressed models under /models and /gpu/models, policy gating, /proc/boot provenance. Integration: model registry sidecar; signature verification; LoRA lineage tracking. Constraints: regulated AI, privacy boundaries.

22) Ransomware-resistant control-plane safe mode

Business outcome: Maintain telemetry and remote control even if host OS degrades. Why Cohesix: minimal boundary, read-only recovery namespace, immutable logs. Integration: rescue worker profile; out-of-band operator attach flow. Constraints: incident response procedures, tamper evidence.

23) High-integrity event recorder for robotics

Business outcome: Blame-free postmortems with deterministic replay. Why Cohesix: append-only rings, bounded scheduling, file-based replay. Integration: export pipeline sidecar; compression outside the TCB. Constraints: safety certification, retention limits.

24) Edge data-diode style telemetry gateway (one-way-ish)

Business outcome: Outbound-only telemetry posture with minimal inbound surface. Why Cohesix: policy-enforced file verbs, append-only exports. Integration: export-only providers; batching/backpressure tuning. Constraints: regulated environments, packet loss tolerance.

25) Kubernetes coexistence: secure out-of-band orchestrator

Business outcome: Governance layer for lifecycle, telemetry, and GPU leasing without replacing Kubernetes. Why Cohesix: file APIs for control-plane actions; host-side bridge maps K8s to /queen and /shard. Integration: identity mapping; RBAC-to-ticket translation. Constraints: clear separation of responsibilities.

26) Edge learning feedback loop (LoRA/PEFT, control-plane only)

Business outcome: Safe performance feedback for centralized training. Why Cohesix: schema-tagged, bounded telemetry; model lifecycle pointers. Integration: export namespaces for training farms; privacy filters. Constraints: no gradients or raw data in the VM; deterministic bandwidth/storage envelopes.

Diagrams

Figure 1 Edge hive deployment (Smart factory / Retail CV hub / MEC node)

flowchart LR
  subgraph SITE["Edge Site (Factory / Store / MEC)"]
    subgraph HIVE["Cohesix Hive (one Queen, many Workers)"]
      Q["Queen<br/>(root-task + NineDoor)<br/>/queen /proc /log"]:::queen
      W1["Worker: sensors/PLC"]:::worker
      W2["Worker: CV camera ingest"]:::worker
      W3["Worker: app control loop"]:::worker
      WG["Worker: gpu stub<br/>(in-VM, no CUDA)"]:::worker
    end

    subgraph HOST["Host ecosystems (sidecars)"]
      OT["OT protocol bridge<br/>(MODBUS/CAN/DNP3/IEC-104)"]:::sidecar
      GPU["gpu-bridge-host<br/>CUDA/NVML stays here"]:::sidecar
      STORE["Local storage / spool<br/>(ring buffers, batch upload)"]:::sidecar
    end

    CAM["Cameras / Sensors"]:::ext
    PLC["PLCs / Robots"]:::ext
    JET["Jetson / Edge GPU nodes"]:::ext
  end

  CLOUD["Cloud / HQ<br/>(Ops + Registry + Analytics)"]:::cloud
  OPS["Operator / NOC<br/>cohsh or GUI client"]:::ext

  %% flows
  OPS -->|"cohsh attach<br/>(console or Secure9P)"| Q
  CAM -->|"telemetry/video"| W2
  PLC -->|"fieldbus"| OT
  OT -->|"mirrored files<br/>into namespace"| Q
  W1 -->|"append telemetry<br/>/shard/<label>/worker/<id>/telemetry"| Q
  W2 -->|"append summaries<br/>/shard/<label>/worker/<id>/telemetry"| Q
  W3 -->|"control + status"| Q

  Q -->|"ticketed orchestration<br/>/queen/ctl"| W1
  Q -->|"ticketed orchestration<br/>/queen/ctl"| W2
  Q -->|"ticketed orchestration<br/>/queen/ctl"| W3
  Q -->|"lease + job via /gpu/*"| WG
  WG -->|"append job descriptors<br/>/gpu/<id>/job"| Q

  GPU -->|"publishes provider nodes<br/>/gpu/<id>/*"| Q
  JET -->|"CUDA workloads<br/>host-side"| GPU

  Q -->|"append-only logs<br/>/log/*"| Q
  Q -->|"batch export / uplink<br/>(protocol outside TCB)"| STORE
  STORE -->|"durable batch upload"| CLOUD

  classDef queen fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
  classDef worker fill:#f0fdf4,stroke:#15803d,stroke-width:1px;
  classDef sidecar fill:#fff7ed,stroke:#c2410c,stroke-width:1px;
  classDef cloud fill:#eef2ff,stroke:#4338ca,stroke-width:1px;
  classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;

Figure 2: Vendor remote maintenance without VPN sprawl (tickets + leases + append logs)

sequenceDiagram
  autonumber

  participant Vendor as Vendor Engineer
  participant Cohsh as cohsh
  participant ND as NineDoor
  participant POL as AccessPolicy
  participant RT as root-task
  participant MW as maintenance window
  participant DEV as worker ctl
  participant SLOG as session log

  Note over ND: File ops only. Policy runs before provider logic. Logs are append-only.

  Vendor->>Cohsh: obtain scoped ticket
  Vendor->>Cohsh: attach vendor role with ticket
  Cohsh->>ND: TATTACH ticket
  ND->>POL: evaluate ticket scope TTL and rate limits
  POL-->>ND: allow or deny

  alt maintenance window active
    Cohsh->>ND: TOPEN MW read
    ND-->>Cohsh: ROPEN
    Cohsh->>ND: TREAD MW confirm active
    ND-->>Cohsh: RREAD active

    Cohsh->>ND: TOPEN DEV append
    ND-->>Cohsh: ROPEN
    Cohsh->>ND: TWRITE cmd diagnose level basic
    ND->>POL: check path and verb allowed
    POL-->>ND: allow
    ND->>RT: perform validated internal action
    RT-->>ND: ok
    ND-->>Cohsh: RWRITE

    Cohsh->>ND: TOPEN SLOG append
    ND-->>Cohsh: ROPEN
    Cohsh->>ND: TWRITE audit vendor action diagnose target worker
    ND-->>Cohsh: RWRITE
  else window inactive or expired
    Cohsh->>ND: TOPEN MW read
    ND-->>Cohsh: ROPEN
    Cohsh->>ND: TREAD MW
    ND-->>Cohsh: RREAD inactive
    Cohsh->>ND: TWRITE cmd diagnose
    ND-->>Cohsh: Rerror Permission
  end

Figure 3: Air-gapped update ferry (removable media + /updates + audit)

flowchart LR
  USB["Portable media<br/>(update bundles)"]:::ext
  subgraph HIVE["Air-gapped site: Cohesix Hive"]
    Q["Queen<br/>(root-task + NineDoor)"]:::queen
    UPD["/updates/<epoch>/*<br/>(manifest + chunks)"]:::path
    LOG["/log/*<br/>append-only audit"]:::path
  end
  OPS["Operator<br/>cohsh"]:::ext
  HOST["Host cas-tool"]:::sidecar

  USB -->|"ingest bundle"| HOST
  HOST -->|"write manifest + chunks"| UPD
  OPS -->|"inspect status"| UPD
  Q -->|"audit writes"| LOG

  classDef queen fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
  classDef path fill:#f8fafc,stroke:#334155,stroke-dasharray: 4 3;
  classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;
  classDef sidecar fill:#fff7ed,stroke:#c2410c,stroke-width:1px;

Figure 4: GPU lease broker for multi-tenant edge (CUDA stays on host)

sequenceDiagram
  autonumber

  participant Tenant as Tenant App
  participant ND as NineDoor
  participant RT as root-task
  participant GPU as gpu files
  participant GPUB as gpu-bridge-host

  Note over GPUB: CUDA and NVML stay on host. Enforcement happens here.

  Tenant->>ND: TATTACH tenant ticket
  Tenant->>ND: TWALK queen ctl
  ND-->>Tenant: RWALK
  Tenant->>ND: TOPEN queen ctl append
  ND-->>Tenant: ROPEN
  Tenant->>ND: TWRITE spawn gpu lease request
  ND->>RT: validate ticket scope and quotas

  alt capacity available
    RT-->>ND: ok queued
    ND-->>Tenant: RWRITE
    RT->>GPU: append ctl LEASE issued
    GPUB->>GPU: append status QUEUED
    GPUB->>GPU: append status RUNNING
  else no capacity
    RT-->>ND: Err Busy
    ND-->>Tenant: Rerror Busy
  end

  Tenant->>ND: TOPEN gpu job append
  ND-->>Tenant: ROPEN
  Tenant->>ND: TWRITE append job descriptor
  ND-->>Tenant: RWRITE
  GPUB->>GPU: append status OK or ERR

Figure 5: Model governance and provenance at the edge (attested models)

flowchart LR
  REG["Model registry bridge (host sidecar)<br/>CAS + signatures"]:::sidecar
  subgraph HIVE["Cohesix Hive"]
    Q["Queen<br/>(root-task + NineDoor)"]:::queen
    POL["/policy/*<br/>(only signed by X)<br/>allowlist/denylist"]:::path
    MODELS["/models/*<br/>(content addressed)"]:::path
    DEP["/gpu/models/active<br/>(pointer to model id)"]:::path
    BOOT["/proc/boot<br/>(provenance, measurements)"]:::path
    LOG["/log/*<br/>append-only audit"]:::path
    W["Workers consume model ref<br/>(no unsigned blobs)"]:::worker
  end

  OPS["Operator / CI<br/>cohsh"]:::ext

  REG -->|"publish signed model"| MODELS
  OPS -->|"update policy bundle"| POL
  OPS -->|"set active model"| DEP
  DEP -->|"validated by policy"| Q
  Q -->|"audit writes"| LOG
  Q -->|"expose boot + model provenance"| BOOT
  W -->|"fetch by id<br/>verify via policy"| MODELS

  classDef queen fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
  classDef worker fill:#f0fdf4,stroke:#15803d,stroke-width:1px;
  classDef sidecar fill:#fff7ed,stroke:#c2410c,stroke-width:1px;
  classDef path fill:#f8fafc,stroke:#334155,stroke-dasharray: 4 3;
  classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;

27) Unified host control tickets across CUDA, PEFT, systemd, docker, and K8s

Problem: Operators need one auditable mechanism to coordinate GPU lease/model actions and host remediation without introducing sideband RPC channels.

Cohesix flow:

Why this is distinctive:

28) Multi-hive federation (10x1k pattern, single-writer preserved)

Problem: A single hive has practical reliability limits around high worker counts; operators need to orchestrate many hives without introducing active/active split-brain writes.

Cohesix flow:

Why this is distinctive:


Platform Primitives and Typical Integrations

As-built primitives (current releases):

Typical integrations (environment-specific):