Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps. Cohesix is "infrastructure for AGI".
cohsh is the canonical operator shell for the entire hive: one Queen orchestrating many workers via a shared Secure9P namespace.
cohsh is the authoritative CLI for control-plane actions and /proc observability.cohesix>) is for boot-time diagnostics only.cohsh, coh, swarmui, or hive-gateway at a time./actions/queue before writes to /queen/ctl./gpu and /host appear only after the host bridges publish.Cohesix userland exposes two operator entry points:
-serial mon:stdio, showing the cohesix> prompt for on-box bring-up and bootinfo sanity checks.uefi-aarch64 (USB keyboard + HDMI text mirror), which feeds the same root-console parser and command semantics as PL011.cohsh host CLI (coh> prompt) running on the host, speaking to the Cohesix instance over TCP (QEMU for development, UEFI hardware in deployment) or the mock/QEMU transports for development. cohsh never executes inside the VM and follows the same pattern on physical hardware.Use the root console for low-level validation (bootinfo, capability layout, untyped counts) and quick liveness checks. Use cohsh for day-to-day operator workflows and NineDoor interactions.
Related docs
docs/HOST_TOOLS.md — host tool semantics, policy gates, and mounts.docs/INTERFACES.md — canonical control-path schemas and /proc nodes.docs/ROLES_AND_SCHEDULING.md — role-to-namespace rules.docs/API_GUIDELINES.md — REST gateway mapping and constraints.docs/OPERATOR_WALKTHROUGH.md — lifecycle flow and recovery.cohsh, swarmui, hive-gateway, coh, gpu-bridge-host, or host-sidecar-bridge concurrently./policy/rules exists) requires approvals in /actions/queue for writes to /queen/ctl.--mock uses an in-process backend and does not talk to the VM; do not mix mock and live flags in one session./gpu/* appears only after gpu-bridge-host --publish runs; /host/* appears only after host-sidecar-bridge runs.-serial mon:stdio.uefi-aarch64, local-seat keyboard input is routed into the same parser and HDMI mirrors the same output lines with bounded truncation.cohesix> from the in-kernel console loop.【F:apps/root-task/src/console/mod.rs†L216-L305】help – list available commands.【F:apps/root-task/src/console/mod.rs†L224-L233】bi – bootinfo summary (node bits, empty window, IPC buffer if present).【F:apps/root-task/src/console/mod.rs†L234-L250】caps – key capability slots (root CNode, endpoint, UART).【F:apps/root-task/src/console/mod.rs†L252-L263】smp – dump SMP scheduler/CPU info in debug builds (prints ERR reason=unsupported otherwise).【F:apps/root-task/src/console/mod.rs†L265-L300】mem – untyped cap counts with RAM vs device breakdown.【F:apps/root-task/src/console/mod.rs†L265-L283】ping – replies pong as a liveness check.【F:apps/root-task/src/console/mod.rs†L285-L293】quit – currently prints quit not supported on root console; the loop continues (no session exit).【F:apps/root-task/src/console/mod.rs†L285-L299】[cohesix:root-task] [uart] init OK
[console] PL011 console online
cohesix> help
Commands:
help - Show this help
bi - Show bootinfo summary
caps - Show capability slots
smp - Show SMP scheduler/CPU info (debug builds only)
mem - Show untyped summary
ping - Respond with pong
quit - Exit the console session
cohesix> bi
[bi] node_bits=12 empty=[0x0010..0x0100) ipc=0x7f000000
cohesix> caps
[caps] root=0x0001 ep=0x0002 uart=0x0003
cohesix> mem
[mem] untyped caps=16 ram_ut=14 device_ut=2
cohesix> ping
pong
Use this surface to confirm boot-time state before bringing up TCP or NineDoor; it is not the operator-facing control plane.
cohsh Shell (Host CLI)apps/cohsh, installed to out/cohesix/host-tools/cohsh by the build script.【F:scripts/cohesix-build-run.sh†L402-L442】tcp (primary), rest (hive-gateway multiplexer), mock (in-process NineDoor stub), qemu (dev convenience to spawn QEMU). Default is tcp when built with the TCP feature.【F:apps/cohsh/src/main.rs†L44-L132】cohsh/SwarmUI take an exclusive host-side lock by default to prevent concurrent attachments.| Transport | Requires | Best for | Notes |
| — | — | — | — |
| tcp | QEMU or hardware console | Live ops | Single-client console. |
| rest | hive-gateway | Multiplexed ops | Queen-only; uses gateway REST projection. |
| mock | none | CI, demos | Deterministic in-process backend. |
| qemu | QEMU binary + artifacts | Dev convenience | Spawns QEMU and attaches. |
Key options from --help:
--role <role> and --ticket <ticket> to auto-attach on startup.--mint-ticket to emit a host-side ticket and exit; requires --role, accepts --ticket-subject (required for worker roles), --ticket-config (or COHSH_TICKET_CONFIG) and --ticket-secret (or COHSH_TICKET_SECRET) to override config secrets.--script <file> to execute commands non-interactively.--record-trace <file> to record Secure9P frames + ACKs to a trace file (requires --transport mock).--replay-trace <file> to replay a trace file deterministically (requires --transport mock; rejects tampered traces).--transport <mock|qemu|tcp|rest> to choose backend; TCP exposes --tcp-host / --tcp-port (defaults 127.0.0.1:31337). REST uses --rest-url (or COHSH_REST_URL / COH_REST_URL / HIVE_GATEWAY_URL) and supports --rest-auth-token (or COHSH_REST_AUTH_TOKEN / COH_REST_AUTH_TOKEN / HIVE_GATEWAY_REQUEST_AUTH_TOKEN).【F:apps/cohsh/src/main.rs†L44-L132】--mock-seed-gpu to seed mock transport sessions with GPU namespaces (useful for mock GPU demos/scripts).【F:apps/cohsh/src/main.rs†L44-L170】--qemu-bin, --qemu-out-dir, --qemu-gic-version, --qemu-arg (dev/CI convenience).【F:apps/cohsh/src/main.rs†L52-L131】--auth-token forwards the TCP console authentication secret; defaults to changeme.【F:apps/cohsh/src/main.rs†L78-L115】--policy <file> (or COHSH_POLICY) selects the manifest-derived client policy TOML; cohsh fails fast if the policy hash mismatches compiled defaults. Defaults to out/cohsh_policy.toml.--pool-control-sessions, --pool-telemetry-sessions (env COHSH_POOL_CONTROL_SESSIONS, COHSH_POOL_TELEMETRY_SESSIONS).--retry-max-attempts, --retry-backoff-ms, --retry-ceiling-ms, --retry-timeout-ms, --heartbeat-interval-ms (env COHSH_RETRY_MAX_ATTEMPTS, COHSH_RETRY_BACKOFF_MS, COHSH_RETRY_CEILING_MS, COHSH_RETRY_TIMEOUT_MS, COHSH_HEARTBEAT_INTERVAL_MS).COHSH_CONSOLE_LOCK=0 disables the exclusive TCP console lock (debug-only; concurrent clients will churn).Manifest-derived policy defaults are emitted by coh-rtc into out/cohsh_policy.toml and embedded into the CLI at build time. The CLI refuses to start if the policy or manifest hash drifts.
Auth token vs ticket
attach.ERR.manifest.sha256: 3a20adc55c8f975e20e8ef031422f8a09b4a7b8e524dd052bf69296ddf7ff1afpolicy.sha256: 96262c617e5a15321d58f069f17664dfbe02ffa9e6e4df7a38169c21b4e37ee8cohsh.pool.control_sessions: 2cohsh.pool.telemetry_sessions: 4cohsh.tail.poll_ms_default: 1000cohsh.tail.poll_ms_min: 250cohsh.tail.poll_ms_max: 10000cohsh.host_telemetry.nvidia_poll_ms: 1000cohsh.host_telemetry.systemd_poll_ms: 2000cohsh.host_telemetry.docker_poll_ms: 2000cohsh.host_telemetry.k8s_poll_ms: 5000retry.max_attempts: 3retry.backoff_ms: 200retry.ceiling_ms: 2000retry.timeout_ms: 5000heartbeat.interval_ms: 15000trace.max_bytes: 1048576Generated from configs/root_task.toml (sha256: 3a20adc55c8f975e20e8ef031422f8a09b4a7b8e524dd052bf69296ddf7ff1af).
Manifest-derived CohClient defaults (paths and Secure9P bounds) are emitted by coh-rtc.
manifest.sha256: 3a20adc55c8f975e20e8ef031422f8a09b4a7b8e524dd052bf69296ddf7ff1afsecure9p.msize: 8192secure9p.walk_depth: 8trace.max_bytes: 1048576client_paths.queen_ctl: /queen/ctlclient_paths.queen_lifecycle_ctl: /queen/lifecycle/ctlclient_paths.log: /log/queen.logtelemetry_ingest.max_segments_per_device: 4telemetry_ingest.max_bytes_per_segment: 32768telemetry_ingest.max_total_bytes_per_device: 131072telemetry_ingest.eviction_policy: evict-oldestGenerated from configs/root_task.toml (sha256: 3a20adc55c8f975e20e8ef031422f8a09b4a7b8e524dd052bf69296ddf7ff1af).
Shared console grammar and ticket policy are emitted by coh-rtc from cohsh-core so CLI and console stay aligned.
helpbicapsmempingtestnettestnetstatslogcachelog [n]quittail <path>cat <path>ls <path>echo <path> <payload>attach <role> [ticket]spawn <payload>kill <worker>Generated from cohsh-core verb specs (18 verbs).
lifecycle <cordon|drain|resume|quiesce|reset> — writes to /queen/lifecycle/ctl./proc/lifecycle/state and rejects invalid transitions locally with deterministic ERR./log/queen.log audit lines.ticket.max_len: 224queen tickets are optional; TCP validates claims when present, NineDoor passes through.worker-* tickets are required; role must match and subject identity is mandatory.Generated from cohsh-core ticket policy.
ticket_limits.max_scopes: 8ticket_limits.max_scope_path_len: 128ticket_limits.max_scope_rate_per_s: 64 (0 = unlimited)ticket_limits.bandwidth_bytes: 131072 (0 = unlimited)ticket_limits.cursor_resumes: 16 (0 = unlimited)ticket_limits.cursor_advances: 256 (0 = unlimited)Generated by coh-rtc (sha256: 1b869521f68c26d43c1ad278fbc557f2442e438ab12d443a142e53a33e4466fb).
apps/coh with subcommands mount, gpu, run, evidence, telemetry pull, peft, and doctor.coh mount provides a FUSE view over Secure9P namespaces; --mock uses an in-process NineDoor backend, while live mounts require FUSE (enabled by default on Linux builds; macOS defaults to FUSE disabled and requires --features fuse plus MacFUSE).coh gpu exposes list/status/lease UX over /gpu/* and /queen/ctl; --mock provides deterministic CI output, --nvml (Linux-only, feature-gated) mirrors the host NVML inventory./gpu/models and /gpu/telemetry/schema.json require a host GPU bridge publish (gpu-bridge-host --publish or coh peft import --publish).coh run validates /gpu/<id>/lease, executes a host command, and appends gpu-breadcrumb/v1 lifecycle entries to /gpu/<id>/status (bounded by manifest policy). Use --receipt-out <path> to emit a JSON receipt (ACK lines + bounded /proc/lease/* snapshot) suitable for audit/chargeback pipelines.coh gpu lease can emit a JSON receipt via --receipt-out <path> (request parameters + ACK line + bounded /proc/lease/* snapshot). Receipts never include console auth tokens or raw capability tickets.coh evidence pack exports a deterministic evidence directory (manifest/policy fingerprint, bounds.json, and bounded snapshots under proc/, log/, audit/, replay/, optional telemetry/). Exported audit JSONL hashes ticket fields (ticket → sha256:<hex>) so packs do not leak raw capability tickets.coh evidence timeline generates timeline.ndjson and timeline.md offline from an evidence pack directory (no network access).coh telemetry pull pulls /queen/telemetry/* segments into host storage; resumable and idempotent (per-segment files).coh peft export pulls /queen/export/lora_jobs/<job_id> into host storage with bounded reads.coh peft import stages adapters into the host registry (coh.peft.import.registry_root); use --publish to refresh /gpu/models in the live VM via /gpu/bridge/ctl.coh peft activate swaps /gpu/models/active and records rollback state under the registry root; coh peft rollback reverts to the previous pointer.host-ticket-agent is a host-only worker for /host/tickets/*: it tails spec, executes allowlisted adapters, and appends bounded lifecycle receipts to status/deadletter.id + idempotency_key; terminal states (succeeded, failed, expired) are deduped for replay-safe reprocessing.coh doctor validates tickets, mount capability, NVML (unless --mock), and runtime prerequisites.--rest-url (or COH_REST_URL / HIVE_GATEWAY_URL) routes live operations through hive-gateway without attaching to the TCP console (queen role only). REST request-auth uses --rest-auth-token (or COH_REST_AUTH_TOKEN / COHSH_REST_AUTH_TOKEN / HIVE_GATEWAY_REQUEST_AUTH_TOKEN). coh mount --rest-url is exclusive: one REST mount per gateway URL.--auth-token (env COH_AUTH_TOKEN, fallback COHSH_AUTH_TOKEN, default changeme).COH_POLICY (or default out/coh_policy.toml) must hash-match compiled defaults.Common live prerequisites
127.0.0.1:31337.gpu-bridge-host --publish for /gpu/* visibility.host-sidecar-bridge --watch for /host/* visibility.host-ticket-agent for ticket-driven host orchestration via /host/tickets/spec.hive-gateway, set COH_REST_URL, and set request-auth (HIVE_GATEWAY_REQUEST_AUTH_TOKEN or tool-specific --rest-auth-token) so mutating writes succeed through the gateway./actions/queue if /policy/rules gates /queen/ctl.PEFT registry layout (host-side, file-native):
<registry_root>/available/<model_id>/manifest.toml plus staged adapter files.<registry_root>/active holds the current active model pointer.<registry_root>/active_state.toml stores current/previous for rollback.Manifest-derived coh policy defaults are emitted by coh-rtc.
manifest.sha256: 2e64f09fb17eafce52fe3e7a29fa7eb11f2299022ca7d13eabf9b31c809b4234policy.sha256: 9465e51f6b247269539107387e570bcdc873aa03e88076fad6c710659b7728dbcoh.mount.root: /coh.mount.allowlist: /proc, /queen, /worker, /log, /gpu, /hostcoh.telemetry.root: /queen/telemetrycoh.telemetry.max_devices: 32coh.telemetry.max_segments_per_device: 4coh.telemetry.max_bytes_per_segment: 32768coh.telemetry.max_total_bytes_per_device: 131072coh.run.lease.schema: gpu-lease/v1coh.run.lease.active_state: ACTIVEcoh.run.lease.max_bytes: 1024coh.run.breadcrumb.schema: gpu-breadcrumb/v1coh.run.breadcrumb.max_line_bytes: 512coh.run.breadcrumb.max_command_bytes: 256coh.peft.export.root: /queen/export/lora_jobscoh.peft.export.max_telemetry_bytes: 131072coh.peft.export.max_policy_bytes: 8192coh.peft.export.max_base_model_bytes: 1024coh.peft.import.registry_root: out/model_registrycoh.peft.import.max_adapter_bytes: 67108864coh.peft.import.max_lora_bytes: 65536coh.peft.import.max_metrics_bytes: 65536coh.peft.import.max_manifest_bytes: 8192coh.peft.activate.max_model_id_bytes: 128coh.peft.activate.max_state_bytes: 4096retry.max_attempts: 3retry.backoff_ms: 200retry.ceiling_ms: 2000retry.timeout_ms: 5000
coh doctor runs deterministic host checks for tickets, mount capability, NVML (unless --mock), and runtime prerequisites.status=degraded backend=cuda.--mock on fresh hosts to skip NVML and QEMU checks when running mock demos.check=policy validates coh_policy.toml against manifest + policy hashes.check=ticket uses ticket.max_len=224 and TCP policy (queen tickets optional, worker tickets required).check=mount validates allowlist under coh.mount.root and requires FUSE when not --mock.check=nvml prefers NVML when not --mock; Jetson-class NVML falls back to CUDA discovery.check=runtime checks python3 and qemu-system-aarch64 (QEMU skipped with --mock).secure9p.msize: 8192secure9p.walk_depth: 8coh.mount.allowlist: /proc, /queen, /worker, /log, /gpu, /hostGenerated by coh-rtc (sha256: 66febf7b6dae0625c6a004490655dfcea1dd5777fe6792ecf027164df8f2ab4f).
tools/cohesix-py/ with filesystem (coh mount) and TCP console backends.tools/cohesix-py/examples/ and emit artifacts under out/examples/.CohesixOrchestrator (typed schedule/lease/export/approval APIs), host integration adapters, and cohesix-playbook for high-impact fleet playbooks.HostTicketRequest) and RBAC->ticket K8s coexistence translation (K8sRbacIntent, enqueue_k8s_rbac_tickets).Manifest-derived cohesix defaults are emitted by coh-rtc.
manifest.sha256: 2e64f09fb17eafce52fe3e7a29fa7eb11f2299022ca7d13eabf9b31c809b4234cohesix.defaults.sha256: ced521e47dd16ec53301d6d0c16c0681525be03a653ef2357434a7976e338cb6secure9p.msize: 8192secure9p.walk_depth: 8console.max_line_len: 256console.max_path_len: 96console.max_json_len: 192console.max_echo_len: 224telemetry_ingest.max_bytes_per_segment: 32768telemetry_ingest.max_total_bytes_per_device: 131072coh.mount.root: /coh.mount.allowlist: /proc, /queen, /worker, /log, /gpu, /hostcoh.telemetry.root: /queen/telemetrycoh.run.breadcrumb.max_line_bytes: 512coh.peft.import.registry_root: out/model_registryGenerated by coh-rtc (sha256: ef522e0341d65fb59287879364b7c0f066eaddb4121dc18463c8d07abd7ba07d).
apps/swarmui.cohsh transport); set SWARMUI_TRANSPORT=9p to use Secure9P (9P/TCP) or SWARMUI_TRANSPORT=rest for hive-gateway.SWARMUI_9P_HOST / SWARMUI_9P_PORT (defaults 127.0.0.1:31337) for console/9p, or SWARMUI_REST_URL (fallback COH_REST_URL) for REST. REST request-auth uses SWARMUI_REST_AUTH_TOKEN (fallback HIVE_GATEWAY_REQUEST_AUTH_TOKEN, COHSH_REST_AUTH_TOKEN, COH_REST_AUTH_TOKEN).--no-default-features to strip it and rebuild with --features rest when needed.SWARMUI_AUTH_TOKEN (fallback COHSH_AUTH_TOKEN, default changeme).script-src 'unsafe-eval' to support the PixiJS Live Hive renderer.cohsh for CLI-only features.$DATA_DIR/snapshots/ and never touches the network.--replay-trace <file> (relative paths resolved under $DATA_DIR/traces/).--mint-ticket emits a host-side ticket and exits; accepts --role, --ticket-subject, --ticket-config, --ticket-secret (env SWARMUI_TICKET_CONFIG / SWARMUI_TICKET_SECRET, fallback to COHSH_*).Manifest-derived SwarmUI defaults are emitted by coh-rtc.
manifest.sha256: 1aee4b854f804a6f250d113062597502ba30a1198315c33fe0156d5d2d1b7cb3swarmui.defaults.sha256: 82a9d590f2cf358ea50907f491df48f8fcd005dee0d3a4498cec7c8a5597e2c1swarmui.ticket_scope: per-ticketswarmui.cache.enabled: falseswarmui.cache.max_bytes: 262144swarmui.cache.ttl_s: 3600swarmui.hive.frame_cap_fps: 30swarmui.hive.step_ms: 16swarmui.hive.lod_zoom_out: 0.7swarmui.hive.lod_zoom_in: 1.25swarmui.hive.lod_event_budget: 512swarmui.hive.snapshot_max_events: 4096swarmui.hive.overlay_lines: 3swarmui.hive.detail_lines: 50swarmui.hive.line_cap_bytes: 160swarmui.hive.per_worker_bytes: 2048swarmui.hive.pending_lines_per_worker: 64swarmui.hive.pending_event_cap: 4096swarmui.hive.poll_workers_per_tick: 32swarmui.hive.status_poll_ms: 500swarmui.hive.degrade_pressure: 1.0swarmui.paths.telemetry_root: /workerswarmui.paths.proc_ingest_root: /proc/ingestswarmui.paths.worker_root: /workerswarmui.paths.namespace_roots: /proc, /queen, /worker, /log, /gputrace.max_bytes: 1048576Generated from configs/root_task.toml (sha256: 1aee4b854f804a6f250d113062597502ba30a1198315c33fe0156d5d2d1b7cb3).
Startup banner and prompt:
Welcome to Cohesix. Type 'help' for commands.
detached shell: run 'attach <role>' to connect
coh>
Commands and status:
help – show the command list.【F:apps/cohsh/src/lib.rs†L1125-L1162】attach <role> [ticket] / login – attach to a NineDoor session. Valid roles: queen, worker-heartbeat, worker-gpu, worker-bus, worker-lora (CLI accepts worker as an alias for worker-heartbeat); missing roles, unknown roles, too many args, or re-attaching emit errors via the parser and shell.【F:apps/cohsh/src/lib.rs†L711-L729】【F:apps/cohsh/src/lib.rs†L1299-L1317】detach – close the current session without exiting the shell (required for multi-role scripts).【F:apps/cohsh/src/lib.rs†L1244-L1255】tail <path> – stream a file; log tails /log/queen.log. Requires attachment.【F:apps/cohsh/src/lib.rs†L1170-L1179】ping – reports attachment status; errors when detached or when given arguments.【F:apps/cohsh/src/lib.rs†L1181-L1194】test [--mode <quick|full|smp>] [--json] [--timeout <s>] [--no-mutate] – run the in-session self-tests sourced from /proc/tests/ (default mode quick, default timeout 30s, hard cap 120s). --no-mutate skips spawn/kill steps. When --json is supplied, emit the stable schema described below.【F:apps/cohsh/src/lib.rs†L1512-L1763】
quit; interactive cohsh reattaches to the last session when possible, while --script runs remain detached and require a fresh attach.pool bench <k=v...> – run the pooled throughput benchmark and retry/exhaustion checks; options include path, ops, batch, payload, payload_bytes, delay_ms, inject_failures, inject_bytes, exhaust, kind.
payload_bytes.echo <text> > <path> – append a newline-terminated payload to an absolute path via NineDoor.【F:apps/cohsh/src/lib.rs†L1211-L1222】【F:apps/cohsh/src/lib.rs†L1319-L1332】echo (strict JSON; unknown fields rejected). Examples:
echo {"id":"sched-1","role":"worker-gpu","priority":2,"ticks":3,"budget_ms":120} > /queen/schedule/ctlecho {"op":"grant","id":"lease-1","subject":"queen","resource":"gpu0","ttl_s":300,"priority":5} > /queen/lease/ctlecho {"op":"open","id":"export-1","ttl_s":900} > /queen/export/ctlecho {"op":"apply","id":"rev-2026-02-03","sha256":"<64-hex>"} > /policy/ctlls <path> – list directory entries; entries are newline-delimited and returned in lexicographic order.cat <path> – bounded read of file contents.
/proc/root/* (reachability/cut), /proc/9p/session/active (session summary), /proc/pressure/* (refusal counters), /proc/ingest/* (ingest stats), /proc/schedule/* and /proc/lease/* (queue/lease snapshots).spawn <role> [opts] – queue a worker spawn via /queen/ctl (e.g. spawn heartbeat ticks=100, spawn gpu gpu_id=GPU-0 mem_mb=4096 streams=2 ttl_s=120).kill <worker_id> – queue a worker termination via /queen/ctl.bind <src> <dst> – bind a canonical namespace path to a session-scoped mount point via /queen/ctl.mount <service> <path> – mount a named service namespace via /queen/ctl.telemetry push <src_file> --device <id> – request an OS-named segment under /queen/telemetry/<device_id>/seg/ and append bounded telemetry records using cohsh-telemetry-push/v1 envelopes (UTF-8, allowlisted extensions only; chunked to max_record_bytes=4096 and telemetry_ingest.max_bytes_per_segment).quit – prints closing session and exits the shell loop.【F:apps/cohsh/src/lib.rs†L1250-L1252】Attachment semantics:
attach requires a role.unknown role '<x>'.attach takes at most two arguments: role and optional ticket.already attached; run 'quit' to close the current session.【F:apps/cohsh/src/lib.rs†L711-L717】Connection handling (TCP transport):
[cohsh][tcp] connected to <host>:<port> (connects=N) before presenting the prompt.【F:apps/cohsh/src/transport/tcp.rs†L54-L60】[cohsh][tcp] connection lost: … and trigger reconnect attempts with incremental back-off, emitting [cohsh][tcp] reconnect attempt #<n> …. The shell remains usable in interactive mode; in --script mode errors propagate and stop the run.【F:apps/cohsh/src/transport/tcp.rs†L63-L73】OK <VERB> [detail] or ERR <VERB> reason=<busy|quota|cut|policy> [detail=<...>] for every console command, sharing one dispatcher across serial and TCP so both transports see the same lines before any payload (for example, OK TAIL path=… precedes streamed data).【F:apps/root-task/src/event/mod.rs†L1000-L1018】PING always yields PONG without affecting state, keeping automation healthy when idle, while TCP adds a 15-second heartbeat cadence on top of the shared grammar so the client can detect stalls without blocking serial progress.【F:apps/root-task/src/event/mod.rs†L1170-L1183】【F:apps/cohsh/src/transport/tcp.rs†L21-L24】cohsh sessions send periodic silent PING keepalives while idle to avoid TCP console inactivity timeouts; acknowledgements are drained and not echoed at the prompt.【F:apps/cohsh/src/lib.rs†L1046-L1955】cohsh parses acknowledgement lines using a shared helper, surfaces details inline with shell output, and preserves the order produced by the root-task dispatcher so scripted attach/tail/log flows match serial transcripts byte-for-byte.【F:apps/cohsh/src/proto.rs†L5-L44】【F:apps/cohsh/src/lib.rs†L1031-L1044】--script <file> feeds newline-delimited commands; blank lines and lines starting with # are ignored. Errors abort the script and bubble up as a non-zero exit.【F:apps/cohsh/src/lib.rs†L732-L763】
.coh is a deterministic, line-oriented scripting format for running cohsh command sequences non-interactively (including coh> test regression suites) using the exact same command handlers as the interactive coh> prompt.lifecycle cordon|drain|resume|quiesce|reset) are valid script lines and apply the same local validation as interactive use.cohsh commands plus assertions.cohsh session (already connected); the session is expected to be AUTH’d and ATTACH’d. Scripts (and coh> test) may validate session state and fail fast if invalid.coh> prompt (identical parsing and handlers, no special RPC path).EXPECT, stop immediately and return FAIL.# starts a comment to end of line.Two statement families:
EXPECT is interpreted as a cohsh command exactly as typed at coh>.cohsh for that command).EXPECT OK — last command response line must begin with OK.EXPECT ERR — last command response line must begin with ERR.EXPECT SUBSTR <text> — last command response line must contain <text> as a substring (case-sensitive).EXPECT NOT <text> — last command response line must not contain <text>.An optional control statement is provided for bounded waits: WAIT <ms> pauses locally (does not issue a server command) for the requested duration.
For streaming commands, the “response line” is the initial acknowledgement line (OK … or ERR … that starts the stream), not any subsequent streamed payload lines.
test --timeout; scripts must not block indefinitely.WAIT <ms> (line statement), capped at 2000 ms; longer waits are rejected.coh> test reads .coh scripts from /proc/tests/:
/proc/tests/selftest_quick.coh/proc/tests/selftest_full.coh/proc/tests/selftest_negative.coh/proc/tests/selftest_smp.cohcoh> test JSON schemaWhen invoked with --json, coh> test emits:
{
"ok": true,
"mode": "quick",
"elapsed_ms": 123,
"checks": [
{"name": "preflight/ping", "ok": true, "detail": "OK ping"},
{"name": "line 4: cat /proc/boot", "ok": true, "detail": "OK"}
],
"version": "1"
}
Quick check (ping, proc read, and an expected error):
# connectivity and auth sanity
ping
EXPECT OK
cat /proc/queen/state
EXPECT OK
echo forbidden > /queen/ctl
EXPECT ERR
Disposable worker lifecycle with ID assertion:
spawn gpu gpu_id=GPU-0 mem_mb=4096 streams=1 ttl_s=60
EXPECT OK
ls /shard
EXPECT OK
EXPECT SUBSTR path=/shard
tail /shard/<label>/worker/worker-123/telemetry
EXPECT OK
WAIT 500
kill worker-123
EXPECT OK
EXPECT NOT ERR
cohsh over TCPThis section covers the development harness for running Cohesix on QEMU; production deployments target physical ARM64 hardware booted via UEFI with equivalent console and cohsh semantics.
Run the build wrapper to compile components, stage host tools, and launch QEMU with PL011 serial plus a user-mode TCP forward to 127.0.0.1:<port>:
SEL4_BUILD_DIR="$PWD/seL4/SMP_build" ./scripts/cohesix-build-run.sh \
--sel4-build "$PWD/seL4/SMP_build" \
--out-dir out/cohesix \
--profile release \
--root-task-features cohesix-dev \
--cargo-target aarch64-unknown-none \
--transport tcp
Use --sel4-build "$PWD/seL4/build" to target the single-core baseline (keeps SMP artifacts separate).
The script builds root-task with the serial and TCP console features, compiles NineDoor and workers, copies host tools (cohsh, gpu-bridge-host, host-sidecar-bridge) into out/cohesix/host-tools/, and assembles the CPIO payload.【F:scripts/cohesix-build-run.sh†L369-L454】【F:scripts/cohesix-build-run.sh†L402-L442】
QEMU runs with -serial mon:stdio and a user-net device that forwards TCP/UDP ports 31337–31339 into the guest so the TCP console and self-tests are reachable from the host.【F:scripts/cohesix-build-run.sh†L518-L553】 The wrapper selects the NIC backend from the root-task features: dev-virt (via cohesix-dev) uses virtio-net by default, which adds -global virtio-mmio.force-legacy=false for the modern header; removing net-backend-virtio switches the wrapper to RTL8139 instead.【F:scripts/cohesix-build-run.sh†L518-L553】 The script prints the ready command for cohsh once QEMU is live.【F:scripts/cohesix-build-run.sh†L546-L553】 In deployment, the same console and cohsh flows apply to UEFI-booted ARM64 hardware without the VM wrapper.
net-backend-virtio is enabled)virtio-mmio.force-legacy=false so QEMU exposes the modern header and the driver accepts it by default.【F:scripts/cohesix-build-run.sh†L518-L544】【F:apps/root-task/src/drivers/virtio/net.rs†L118-L157】 Use the host forwards above to reach the TCP console (31337), UDP echo self-test (31338), and TCP smoke test (31339).VIRTIO_MMIO_FORCE_LEGACY=1 before invoking the script and rebuild with --features virtio-mmio-legacy. The wrapper will switch QEMU to -global virtio-mmio.force-legacy=true; the driver will reject v1 unless the feature gate is enabled.【F:scripts/cohesix-build-run.sh†L518-L544】【F:apps/root-task/src/drivers/virtio/net.rs†L1379-L1411】 When debugging legacy, prefer bumping QEMU back to modern instead of carrying the feature in normal builds.--transport tcp flow above (virtio-net backend)../cohsh --transport tcp --tcp-port 31337.lo0): sudo tcpdump -i lo0 -n tcp port 31337 or udp port 31338 or tcp port 31339.cohsh session over TCPFrom out/cohesix/host-tools/:
./cohsh --transport tcp --tcp-port 31337
Welcome to Cohesix. Type 'help' for commands.
detached shell: run 'attach <role>' to connect
coh> attach queen
[console] OK ATTACH role=Queen session=1
attached session SessionId(1) as Queen
coh>
Use log to stream /log/queen.log, ping for health, and tail <path> for ad-hoc inspection. If the TCP session resets, cohsh reports the error and continues in a detached state; reconnects are attempted automatically with back-off in interactive mode.【F:apps/cohsh/src/transport/tcp.rs†L54-L73】
--scriptExample script (queen.coh):
# Attach and tail the queen log
attach queen
log
quit
Run via ./cohsh --transport tcp --tcp-port 31337 --script queen.coh. The runner stops on the first error (including connection failures) and propagates the error code to the host shell.【F:apps/cohsh/src/lib.rs†L732-L763】
Use ./cohsh --check <script.coh> to validate .coh syntax without executing commands.【F:apps/cohsh/src/main.rs†L28-L138】
cohsh (no new verbs, no new in-VM endpoints) and focuses on presentation and workflow rather than new privileges.--transport tcp and the hostfwd rule; the build script prints the expected port.【F:scripts/cohesix-build-run.sh†L521-L553】cohsh logs the reset and reconnect attempts. Re-run attach <role> once the console listener is reachable.【F:apps/cohsh/src/transport/tcp.rs†L63-L73】--auth-token (or COHSH_AUTH_TOKEN) matches the listener requirement; the TCP transport defaults to changeme.【F:apps/cohsh/src/main.rs†L78-L115】ping on the serial console (cohesix>) to isolate network issues.【F:apps/root-task/src/console/mod.rs†L214-L320】Not implemented yet, but likely additions for debugging:
net – report virtio-net status and console listener port.tcp – list active TCP console sessions and counters.9p – basic NineDoor state (session counts, outstanding requests).trace – toggle trace categories for boot/net/9p.
Any future commands must remain deterministic, no_std-friendly, and will be documented here when they land.docs/ARCHITECTURE.md.docs/INTERFACES.md (once stabilised).cohsh CLI.