cohesix

Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".

View the Project on GitHub lukeb-aidev/cohesix

Cohesix Failure Modes

This document lists deterministic failure behavior and the required operator responses. All behavior here is as-built: observed via /proc nodes, control files, and /log/queen.log audit lines.

Operating principles

Quick triage checklist

Lifecycle failures

1) Invalid lifecycle transition

Signal

Impact

Recovery

2) Outstanding leases block drain, quiesce, or reset

Signal

Impact

Recovery

  1. Inspect active workers (for example, via /worker or /shard/.../worker).
  2. Explicitly revoke or kill workers using /queen/ctl.
  3. Re-issue the lifecycle command once leases are zero.

3) Lifecycle gate denial

Signal

Impact

Recovery

Policy gate failures

1) Missing approval for gated control write

Signal

Impact

Recovery

  1. Read /policy/rules to confirm the target is gated.
  2. Queue an approval in /actions/queue with id, target, and decision.
  3. Retry the control write.

2) Replay attempt for a consumed approval

Signal

Impact

Recovery

Console and transport failures

1) Console already in use

Signal

Impact

Recovery

  1. Quit the active console client (cohsh, swarmui, hive-gateway, coh, gpu-bridge-host, host-sidecar-bridge).
  2. Retry the connection.

2) Connection refused or wrong port

Signal

Impact

Recovery

Telemetry ingest pressure

Telemetry ingest refusal is deterministic and policy-driven.

Signals

Recovery

Host publish denial

Host providers are gated by lifecycle state and policy.

Signals

Recovery

Worker attach denial

Worker roles cannot attach when lifecycle gates are closed.

Signals

Recovery

Host bridge visibility failures

1) /gpu or /gpu/models is empty

Signal

Impact

Recovery

2) /host is empty

Signal

Impact

Recovery

Bounds and path violations

1) Path or read size exceeds bounds

Signal

Impact

Recovery