cohesix

Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".

View the Project on GitHub lukeb-aidev/cohesix

Cohesix

Cohesix 0.9.0-beta is the project’s first beta release, with significant improvements in stability and operator usability.

Open-source releases are available in releases/.

Tested platforms:

Models Tested:

Why Cohesix?

Cohesix explores a specific and deliberately narrow problem space: how to build a small, auditable, and secure control plane for orchestrating distributed edge GPU systems, without inheriting the complexity, opacity, and attack surface of general-purpose operating systems.

Cohesix is a research operating system with practical goals: to test whether a formally grounded microkernel, a file-shaped control plane, and a strictly bounded userspace can handle real edge-orchestration workloads in hostile and unreliable environments. The project is informed by earlier work in film and broadcast systems where reliability, timing, and control mattered more than convenience. In practical terms, the project would not be feasible without extensive use of AI agents for architecture review, design iteration, code synthesis, debugging assistance, and documentation refinement.

Cohesix is intentionally opinionated. It treats determinism, auditability, and security as design inputs rather than constraints, and is willing to exclude large classes of features to preserve those properties.

Cohesix has a strong MLOps fit: model lifecycle pointers, deterministic CAS updates, and bounded telemetry streams make training, rollout, and audit pipelines reproducible without introducing in-VM ML stacks.


seL4 foundation

Cohesix runs on seL4, a high-assurance capability-based microkernel with a machine-checked proof of correctness. seL4 provides strong isolation, explicit authority, and deterministic scheduling while keeping the kernel extremely small. This lets Cohesix place all policy and orchestration logic in a pure Rust userspace, minimize the trusted computing base, and enforce capability-scoped control planes without relying on POSIX semantics, in-kernel drivers, or ambient authority.


What is Cohesix?

Cohesix is a minimal orchestration operating system for secure edge management, targeting a defined set of use cases around AI hives and distributed GPU workloads.

Technically, Cohesix is a pure Rust userspace running on upstream seL4 on aarch64/virt (GICv3). Userspace is shipped as a static CPIO boot payload containing the root task, the NineDoor Secure9P server, and worker roles; host tools live outside the VM.

Cohesix does not include a traditional filesystem; instead it exposes a synthetic Secure9P namespace where paths represent capability-scoped control and telemetry interfaces rather than persistent storage.

Cohesix does not provide HTTPS or TLS. Instead, it relies on an authenticated, encrypted private network (e.g. VPN or overlay) for transport security, keeping the Cohesix TCB small and focused. See this example.

All control and telemetry are file-shaped and exposed via Secure9P; the console mirrors those semantics. There are no ad-hoc RPC channels, no background daemons, and no general in-VM networking services.

For audits and incident review, the host-side coh evidence pack tool exports deterministic, redacted evidence bundles sourced only from existing /proc, /log, /audit, /replay, and telemetry surfaces.

Operators interact with Cohesix through two consoles:

The intended deployment target is physical ARM64 hardware, with Raspberry Pi 4 bring-up aligned to the upstream seL4 U-Boot + binary image flow. Today, QEMU aarch64/virt is used for bring-up, CI, and testing, with the expectation that QEMU behaviour closely mirrors the eventual hardware profiles where applicable. Cohesix is not a general-purpose operating system and deliberately avoids POSIX semantics, libc, dynamic linking, and in-VM hardware stacks to keep the system small and analyzable.

In short, Cohesix treats orchestration itself as an operating-system problem, with authority, lifecycle, and failure handling as first-class concerns.


Plan 9 heritage and departures

Cohesix is deliberately influenced by Plan 9 from Bell Labs, but it is not a revival, clone, or generalisation of Plan 9. The influence is philosophical rather than literal, and the departures are explicit.

What Cohesix inherits

File-shaped control surfaces
Cohesix exposes control and observation as file operations. Paths such as /queen/ctl, /worker/<id>/telemetry, /log/*, /gpu/<id>/*, and /gpu/models/* are interfaces, not storage. This yields diffable state, append-only audit logs, and a uniform operator surface.

Namespaces as authority boundaries
Like Plan 9’s per-process namespaces, Cohesix uses per-session, role-scoped namespaces. A namespace is not global truth; it is a capability-filtered view of the system. Authority is defined by which paths are visible and writable.

Late binding of services
Services are not assumed to exist. Workers, GPU providers, and auxiliary capabilities are bound into the namespace only when required, supporting air-gapped operation, fault isolation, and minimal steady-state complexity.

Where Cohesix departs

Hostile networks by default
Cohesix assumes unreliable, adversarial, and partitioned networks. Every operation is bounded, authenticated, auditable, and revocable.

No single-system illusion
Partial visibility and degraded operation are normal. Cohesix explicitly rejects the idea of a seamless single-system image.

Control plane only
Secure9P is a control-plane protocol, not a universal IPC or data plane. Cohesix does not host applications, GUIs, or general user environments, and keeps heavy ecosystems outside the trusted computing base.

Explicit authority and revocation
Cohesix enforces capability tickets, time- and operation-bounded leases, and revocation-first semantics. Failure is handled by withdrawing authority, not by retries or self-healing loops.

Determinism over flexibility
Bounded memory, bounded work, and deterministic behaviour are prioritised over convenience and dynamism.


Architecture (high level)

A single Cohesix deployment is a hive: one Queen role orchestrating multiple workers over a shared Secure9P namespace. The root task owns initial authority and scheduling, NineDoor presents the synthetic namespace, and all lifecycle actions are file-driven under /queen, /worker/<id>, /log, and /gpu/<id>.

CUDA, NVML, and other heavy stacks remain host-side. The VM never touches GPU hardware directly.

Figure 1: Cohesix concept architecture (Queen/Worker hive over Secure9P, host-only GPU bridge, dual consoles)

flowchart LR

  subgraph HOST["Host (outside Cohesix VM/TCB)"]
    OP["Operator or Automation"]:::ext
    COHSH["cohsh (host-only)<br/>Canonical shell<br/>transport tcp<br/>role and ticket attach"]:::hosttool
    GUI["SwarmUI (host-only)<br/>Speaks cohsh protocol"]:::hosttool
    WIRE["secure9p-codec/core/transport (host)<br/>bounded framing<br/>TCP transport adapter"]:::hostlib
    GPUB["gpu-bridge-host (host)<br/>CUDA and NVML here<br/>lease enforcement<br/>publishes gpu + models"]:::hosttool
  end

  subgraph TARGET["Target (QEMU aarch64 virt today; Pi4 U-Boot bare metal)"]
    subgraph K["Upstream seL4 kernel"]
      SEL4["seL4<br/>caps, IPC, scheduling<br/>formal foundation"]:::kernel
    end

    subgraph USER["Pure Rust userspace (static CPIO rootfs)"]
      RT["root-task<br/>bootstrap caps and timers<br/>cooperative event pump<br/>spawns NineDoor and roles<br/>owns side effects"]:::vm
      ND["NineDoor (Secure9P server)<br/>exports synthetic namespace<br/>role-aware mounts and policy"]:::vm

      Q["Queen role<br/>orchestrates via queen ctl"]:::role
      WH["worker-heart<br/>heartbeat telemetry"]:::role
      WG["worker-gpu (VM stub)<br/>no CUDA or NVML<br/>uses gpu files"]:::role
    end
  end

  UART["PL011 UART console<br/>bring-up and recovery"]:::console
  TCP["TCP console<br/>remote operator surface"]:::console

  subgraph NS["Hive namespace (Secure9P)"]
    PROC["Path: /proc<br/>boot and status views"]:::path
    QUEENCTL["Path: /queen/ctl<br/>append-only control<br/>spawn kill bind mount<br/>spawn gpu lease requests"]:::path
    WORKTEL["Path: /worker/ID/telemetry<br/>append-only telemetry"]:::path
    LOGS["Path: /log/*<br/>append-only streams"]:::path
    GPUFS["Path: /gpu/ID<br/>info ctl job status<br/>host-mirrored providers"]:::path
    GPUMODELS["Path: /gpu/models<br/>available + active pointers<br/>host-published registry"]:::path
  end

  SEL4 --> RT
  RT --> ND
  RT --- UART
  RT --- TCP

  OP --> COHSH
  OP --> GUI
  COHSH -->|tcp attach| TCP
  GUI -->|same protocol| TCP

  ND --> PROC
  ND --> QUEENCTL
  ND --> WORKTEL
  ND --> LOGS
  ND --> GPUFS
  ND --> GPUMODELS

  Q -->|Secure9P ops| ND
  WH -->|Secure9P ops| ND
  WG -->|Secure9P ops| ND

  QUEENCTL -->|validated then internal action| RT

  GPUB --> WIRE
  WIRE -->|Secure9P transport host-only| ND
  GPUB --> GPUFS
  GPUB --> GPUMODELS

  classDef kernel fill:#eeeeee,stroke:#555555,stroke-width:1px;
  classDef vm fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px;
  classDef role fill:#f0fdf4,stroke:#15803d,stroke-width:1px;
  classDef console fill:#faf5ff,stroke:#7c3aed,stroke-width:1px;
  classDef path fill:#f8fafc,stroke:#334155,stroke-dasharray: 4 3;
  classDef hosttool fill:#fff7ed,stroke:#c2410c,stroke-width:1px;
  classDef hostlib fill:#fffbeb,stroke:#b45309,stroke-width:1px;
  classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;

Components


SwarmUI is the host-side desktop UI for Cohesix. It renders Live Hive telemetry and replays (including bounded per-worker text overlays and a detail panel), and it reuses the same console/Secure9P transports and core verbs as cohsh via an embedded console panel.

Figure 2 SwarmUI replay (Live Hive telemetry visualization) SwarmUI replay screenshot

Getting Started

Option A: Run a pre-built release (fastest)

Pre-built bundles are available in releases/. Each bundle includes its own QUICKSTART.md.

  1. Extract the bundle for your OS (*-MacOS or *-linux).
  2. Install runtime dependencies (QEMU + SwarmUI libs):
    ./scripts/setup_environment.sh
    
  3. Terminal 1: boot the VM:
    ./qemu/run.sh
    
  4. Terminal 2: connect with cohsh:
    ./bin/cohsh --transport tcp --tcp-host 127.0.0.1 --tcp-port 31337 --role queen
    

    For non-local use, tunnel this TCP console over a VPN/overlay (no TLS inside the VM).

  5. If you plan to run non-mock PEFT flows, publish the live GPU registry so /gpu/models is visible:
    ./bin/gpu-bridge-host --publish --tcp-host 127.0.0.1 --tcp-port 31337 --auth-token changeme
    
  6. Optional UI (Mac or Linux desktop):
    ./bin/swarmui
    

    Headless Linux: xvfb-run -a ./bin/swarmui


Option B: Build from source (macOS or Linux)

You need QEMU, Rust, Python 3, and an external seL4 build that produces elfloader and kernel.elf.

macOS 26 (Apple Silicon)

./toolchain/setup_macos_arm64.sh
source "$HOME/.cargo/env"

Linux (Ubuntu 24 recommended)

sudo apt-get update
sudo apt-get install -y git cmake ninja-build clang llvm lld python3 python3-pip qemu-system-aarch64
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain 1.93.1
source "$HOME/.cargo/env"
rustup target add aarch64-unknown-none --toolchain 1.93.1

If you’re on another Linux distro, install the same dependencies with your package manager (QEMU + build essentials + Rust).

Build and run (QEMU + TCP console)

  1. Build seL4 externally (upstream) for aarch64 + qemu_arm_virt. Place the build at $HOME/seL4/build or pass --sel4-build below.
  2. Terminal 1: build and boot:
    SEL4_BUILD_DIR=$HOME/seL4/build ./scripts/cohesix-build-run.sh \
      --sel4-build "$HOME/seL4/build" \
      --out-dir out/cohesix \
      --profile release \
      --root-task-features cohesix-dev \
      --cargo-target aarch64-unknown-none \
      --transport tcp
    
  3. Terminal 2: connect with cohsh:
    cd out/cohesix/host-tools
    ./cohsh --transport tcp --tcp-port 31337 --role queen
    

References

See below for detailed design, interfaces, and milestone tracking:


Status