PLAYFAIR

A 50-epoch, 3-region, 6-agent tripartite-game test that exercises the entire ECCA stack — chains, services, contracts, and cross-region latency — on a single laptop or in CI.

→ Latest Report (Run B) → Verified runs → How to read the report → Source on GitHub → CI Runs

What is Playfair?
Why does it exist?
When does it run?
What it produces
What it represents
What the result means
How conditions vary
Real test runs (verified)
Anatomy of a report
Run it locally
Run it in CI
Tuning & debugging

What is Playfair?

Playfair is ECCA's system-level integration test. It spins up a real k3d Kubernetes cluster with one server node and three agent nodes — one per simulated region. Each region is given a different cost profile, and a different per-agent budget for compute / storage / bandwidth tokens:

Region	Cheap	Expensive	Budget (C / S / B)
region-storage	storage	compute	`200 / 1000 / 400`
region-compute	compute	storage	`1000 / 200 / 400`
region-bandwidth	bandwidth	compute & storage	`200 / 200 / 1000`

Six agents (two per region) each have a personality — perceive rate, store rate, route rate, sleeve kind, and a Coherence Profile Vector — chosen to either match or fight their region's cost profile. A scripted storyline of spot preemptions, drift spikes, and residue injections forces cross-region migrations ("needlecasts") at fixed epochs.

After every epoch the orchestrator audits the TripartiteGame's allocation accounting and asserts that every agent stayed within its per-region budget. The test passes only if all 50 epochs verify fair.

Why does it exist?

Unit tests catch regressions in individual contracts and packages. The compose-based E2E suite catches regressions in the happy-path API surface. Playfair is different: it forces the system to operate under realistic adversarial pressure:

Cross-region latency is real. We use tc netem to inject 33–75 ms one-way latency between agent nodes (matching the round-trip times of real cloud regions).
Workload imbalance is real. Agents in the wrong region pay the penalty cost. They learn to needlecast.
Failure events happen mid-test. Spot preemptions, drift spikes, and residue injections all fire on schedule.
The full token economy is exercised. Every action burns the right token and is audited against the on-chain allocation.

If a refactor accidentally breaks how the bus replicates across namespaces, how the EpochAnchor contract verifies continuity, or how the Prisma client resolves its native engine inside Alpine — Playfair fails before it merges.

When does it run?

Locally, on demand: bash tests/playfair/run.sh. Takes ~15 minutes from cold (image builds + apply) or ~5 minutes warm.
In CI, on every push to main and on a nightly schedule (03:00 UTC) via the Playfair workflow. The CI run uploads the rendered HTML report as a build artifact and (on main) commits it back to docs/playfair-report.html so it's published to GitHub Pages.
Manually, from the Actions tab via workflow_dispatch, with override inputs for epochs and the latency profile.

What it produces

Each run produces a deterministic set of artifacts under tests/playfair/results/:

playfair-results.json — the canonical machine-readable record of the run (agents, epochs, audits, needlecasts, residues, summary, env metadata).
playfair-report.html — a self-contained, dependency-free HTML report with inline SVG charts. This is the file published to /playfair-report.html.
orchestrator.log — the full orchestrator stdout (one line per epoch tick).
region-{storage,compute,bandwidth}-{siyana,thalamus}.log — per-service tail logs, one file per region per service.

What it represents

The output is an honest record of the system running for ~52 seconds of in-cluster orchestrator time (after a ~3-minute warm-up of cluster + chains + contracts), exercising:

3 chain stacks (medulla-pow, hippocampus-dag, cortex-evm) replicated across 3 regions.
7 contracts deployed once into the compute region's cortex-evm and shared across regions.
2 services per region (siyana-api + thalamus-router), each writing to its own per-region NATS JetStream consumer.
152 perceptions, 49 stores, 33 routes, 24 syncs, and 33 cross-region needlecasts in the verified Run B (Run A was 152 / 43 / 31 / — / 31 — see Real test runs).
A continuous fairness audit on the TripartiteGame allocation contract — 50 audit calls, 50 fair, 0 unfair.

What the result means

Verdict	What it means	Action
FAIR	Every epoch's per-region allocation respected its budget. The protocol's accounting matches reality.	Ship it.
UNFAIR	One or more epochs over-spent. Either an agent leaked tokens, a contract under-counted, or a service double-billed.	The report's "unfair epochs" list pin-points the failures. Read the per-region service logs in `results/`.
0 needlecasts	The scripted scenario events failed to fire (likely the orchestrator never connected to siyana-api).	Check that all `siyana-api` pods are `Ready` in `kubectl get pods -A`.
Crashed pods	If `thalamus-router` or `siyana-api` are in `CrashLoopBackOff`, the run will time out instead of producing UNFAIR.	Check Prisma binary mismatches (Alpine openssl version) and per-region NATS consumer naming.

The verdict only tests one property. "All epochs verified fair" means the allocation accounting holds — it does not prove that latency was actually applied, that all 6 agents stayed alive, or that the chain produced blocks. Read the timeline chart, the agent sparklines, and the needlecast log to verify the run actually exercised the system.

How conditions vary

The structural shape of the test (6 agents, 3 regions, 50 epochs, fixed scenario events) is deterministic. What varies between runs:

Per-epoch random activity. Each agent's perceive/store/route decisions are sampled against a probability per epoch. Two runs will produce different per-epoch counts but should converge to the same totals (±10%).
Latency profile. Override with terraform apply -var latency_storage_compute_ms=80 to simulate transcontinental routing. Higher latency shifts the timing of needlecasts but should not change the verdict.
Epoch count. --epochs 200 stresses the long-run fairness; the per-epoch test should be insensitive to count.
Agent budgets. Edit orchestrator.js to make a region's compute budget unrealistically low and the verdict should flip to UNFAIR — this is a useful sanity check that the audit isn't always returning true.
Hardware. Apple Silicon, x86 Linux, and GitHub's ubuntu-latest runners all execute the same code. The Alpine k3s image is the same; the Prisma engine binaries are different (musl/arm64 vs musl/x64) and both are baked into the ecca-ts-builder image.

Real test runs (verified)

Both runs below were executed locally on a MacBook Pro (Apple Silicon, Docker Desktop 4.x, k3d v5.8.3, Terraform 1.5.7, Node 20.x) against commit 618eac8. The complete artifacts are linked from each card. Both runs verified FAIR on all 50 epochs with zero unfair epochs.

RUN A · FROM-SCRATCH

2026-05-09 · 13:36:25 → 13:37:17 UTC · orchestrator 56 s · apply ≈ 8 min

Epochs	50 / 50 fair
Perceptions	152
Stores	43
Routes	31
Migrations	31
Residues	1 (resolved)
Scenarios	9 / 9 fired
Verdict	FAIR

First end-to-end run after terraform apply -auto-approve -var skip_images=true against a freshly created k3d cluster (1 server + 3 agents).

RUN B · REPRODUCED

2026-05-09 · 13:47:13 → 13:48:05 UTC · orchestrator 51.7 s

Epochs	50 / 50 fair
Perceptions	152
Stores	49
Routes	33
Syncs	24
Migrations	33
Residues	1 (resolved)
Verdict	FAIR

Re-run with rebuilt ecca-playfair-orchestrator:local to validate the new env metadata block. Same cluster, fresh orchestrator job. → View Run B report

Per-agent activity (Run B)

Six agents, two per region, each with a sleeve kind that interacts differently with its region's cost profile. Token columns are cumulative burn over 50 epochs:

Agent	Home region	Sleeve	Perceive	Store	Route	Sync	Compute tok	Storage tok	Bandwidth tok	Final drift
Archivist-Alpha	storage	memory	13	12	3	1	52	36	15.7	8.0
Archivist-Beta	storage	human	36	20	1	7	75	60	5.2	10.0
Inference-Prime	compute	ai	41	6	2	8	328	18	12.2	6.0
Inference-Echo	compute	ai	36	4	5	6	288	12	26.7	6.0
Router-Nexus	bandwidth	mining	13	3	12	1	52	9	61.2	8.0
Router-Sentinel	bandwidth	memory	13	4	10	1	52	12	51.0	8.0

What this means: the bold cells show each region's "expected" specialty: AI sleeves in compute burn the most compute tokens; mining/memory sleeves in bandwidth burn the most bandwidth tokens. The Archivists in storage show balanced storage burn (36 + 60 = 96 storage tokens against the region budget of 1000). No agent exceeded its per-region budget on any epoch — that's what the FAIR verdict tracks.

Scenario events (deterministic, fired at fixed epochs)

Nine scripted events drive cross-region migrations and residue handling so the test exercises the migration paths even on a quiet random seed:

Epoch	Type	Agent	What happened
5	spot-preemption	Inference-Prime	Spot instance pulled from `compute`; agent must needlecast to `storage`.
8	respawn	Inference-Prime	Re-sleeves in `storage` region (expensive compute, cheap storage).
15	needlecast	Inference-Prime	Spot instance back; needlecasts `storage → compute` at cost 7.1 RoutingToken.
20	drift-spike	Archivist-Beta	Human agent goes idle 5 epochs; drift accumulates to 16.
25	sync-recovery	Archivist-Beta	Returns and burns 4.5 SyncToken to reset drift.
30	residue-inject	(bandwidth)	Simulated `shard-loss`; first responder earns the bounty.
30	residue-resolved	Router-Sentinel	Detected and resolved within 1.14 s; payout = 15 ResidueToken.
35	needlecast	Inference-Echo	Migrates `compute → bandwidth` for cheaper routing during high-needlecast phase.
40	epoch-surge	(all)	All agents perceive at max rate for 5 epochs — stress test.
45	needlecast	Inference-Echo	Returns to `compute` as surge subsides.

Cross-region migrations (33 in Run B)

Each row is a needlecast: an agent's identity + sleeve state migrates from one region's stack to another, paying RoutingToken proportional to shard count and inter-region latency. The first 8 of 33 are shown — see the full Needlecast log in the report.

epoch  agent                from        to          shards  cost
─────  ───────────────────  ──────────  ──────────  ──────  ─────
   11  Router-Nexus         bandwidth   storage          1   5.1
   11  Router-Sentinel      bandwidth   storage          1   5.1
   12  Router-Sentinel      storage     compute          1   5.1
   13  Router-Nexus         storage     bandwidth        1   5.1
   14  Inference-Prime      storage     bandwidth        1   5.1
   15  Inference-Prime      bandwidth   compute          1   7.1   ← scenario-driven (epoch-15)
   16  Router-Nexus         bandwidth   storage          1   5.1
   18  Router-Nexus         storage     compute          1   5.1
   …    (25 more)

Why the cost varies: the storage↔bandwidth path has the highest injected latency (75 ± 12 ms one-way), so its cost coefficient is higher (7.1 vs 5.1 for the cheaper paths). The cost is verified on-chain by the NeedlecastRouter contract and burned from the agent's BandwidthToken balance.

Allocation audit (every epoch)

After every epoch tick the orchestrator calls the TripartiteGame contract's auditEpoch(epochNumber) view function, which returns whether per-region token sums for the closing epoch respected the per-region budgets. A single failure flips the run verdict to UNFAIR.

{ "epoch":  1, "fair": true, "ts": "2026-05-09T13:47:15.546Z" }
{ "epoch":  2, "fair": true, "ts": "2026-05-09T13:47:16.543Z" }
{ "epoch":  3, "fair": true, "ts": "2026-05-09T13:47:17.514Z" }
…
{ "epoch": 48, "fair": true, "ts": "2026-05-09T13:48:02.573Z" }
{ "epoch": 49, "fair": true, "ts": "2026-05-09T13:48:03.565Z" }
{ "epoch": 50, "fair": true, "ts": "2026-05-09T13:48:04.709Z" }

summary.allEpochsFair = true
summary.unfairEpochs  = []

Residue handling

At epoch 30 the scenario script injects a shard-loss residue in the bandwidth region. The residue carries a bountyEstimate; the first sleeve to detect-and-resolve it claims the payout.

{
  "epoch":          30,
  "kind":           "shard-loss",
  "region":         "bandwidth",
  "ts":             "2026-05-09T13:47:44.445Z",
  "resolved":       true,
  "bountyEstimate": 15,
  "resolvedAt":     "2026-05-09T13:47:45.587Z",
  "resolver":       "Router-Sentinel",
  "payout":         15
}

What this proves: the residue economy works end-to-end — injection on a real chain, detection by a sleeve in the affected region, on-chain proof submission, ResidueToken minted to the resolver. Latency from injection to resolution: 1.14 s, well within the 4-second epoch tick.

Anatomy of a report

Open /playfair-report.html in another tab and follow along. Each section answers a different operational question:

Section	What you see	What it tells you
Header verdict banner	Big FAIR/UNFAIR pill, agent count, region count, epoch count.	The headline result. If UNFAIR, stop here and read the unfair-epochs list.
Runtime configuration panel	Cluster name, latency profile, commit hash, runner (local / github-actions), branch.	Lets a future reader (or CI viewer) tell exactly which version of the code produced this report and under what conditions. Without this panel a report is unfalsifiable.
Stat strip	5 large stats: epochs, perceptions, stores, routes, residues.	One-glance throughput. Compare across runs to detect throughput regressions.
Activity timeline	Stacked-area SVG chart: perceives / stores / routes / syncs over 50 epochs. Red dots mark unfair epochs.	Shape diagnostics. Flat curves = the test never warmed up. Cliff drops = a service crashed mid-run. Red dots = audit violations.
Region cards	Three side-by-side cards: cheap/expensive specialty, per-region budgets (C/S/B), agent count.	Confirms the test's adversarial setup matches the spec.
Per-agent sparklines	4 cumulative lines per agent: perceive (cyan), store (magenta), route (purple), sync (green).	Per-agent behaviour. An agent whose lines are flat after epoch N is silently dead even if no pod restarted.
Region token-usage bars	Three horizontal bars per region: compute / storage / bandwidth burn vs budget.	Visual sanity check on the FAIR verdict. Bars longer than the budget line would mean an over-spend the audit somehow missed.
Scenario timeline	Vertical list of 9 scripted events with epoch + description.	Confirms the scripted storyline actually fired. Missing events = orchestrator crashed before reaching that epoch.
Needlecast log	Table of every cross-region migration with cost.	The cross-chain workload. Should be ≥ 9 in any run that completed (the scripted minimum); typically 30+.
Residue table	Every detected residue, who resolved it, payout, latency.	Proves the residue market clears. Detection-to-resolution latency > 1 epoch suggests a stuck worker.
Footer verdict	Repeated FAIR/UNFAIR + summary sentence + how-to-read explainer.	For readers who scrolled to the bottom first.

Reading order tip. If the verdict is FAIR, skim: header → runtime config → activity timeline → region bars → done. If the verdict is UNFAIR, read: header → unfair-epoch list in the verdict pill → activity timeline (find the red dot) → region bars (which region is over budget) → per-agent sparklines (which agent caused it). The orchestrator log in tests/playfair/results/orchestrator.log contains a per-epoch print line you can grep with the offending epoch number.

Run it locally

Prerequisites: Docker Desktop, brew install k3d terraform kubectl jq, Node 20, pnpm 9.

git clone https://github.com/quellcrist-falconer/ECCA.git
cd ECCA
pnpm install --no-frozen-lockfile
bash tests/playfair/run.sh                        # full from-scratch run
bash tests/playfair/run.sh --skip-images          # reuse cached images
bash tests/playfair/run.sh --epochs 200           # longer game
bash tests/playfair/run.sh --skip-latency         # disable tc netem
bash tests/playfair/run.sh --destroy              # tear down cluster
open tests/playfair/playfair-report.html          # view the report

Under the hood run.sh is a thin wrapper around tests/playfair/terraform/. Every state transition is declared in Terraform — image builds, image imports, latency injection, manifest applies, and the orchestrator job. Idempotent re-runs only re-execute the resources whose source hashes changed.

Run it in CI

The .github/workflows/playfair.yml workflow:

Installs k3d, terraform, and kubectl on the runner.
Builds all images via docker build (no registry — k3d imports them straight from the runner's docker daemon).
Runs terraform apply -auto-approve against the local k3d cluster.
Uploads playfair-report.html + playfair-results.json + per-region logs as a build artifact.
On main, commits the rendered HTML to docs/playfair-report.html with [skip ci] so the Pages workflow picks it up on the next run.

Manual triggers via workflow_dispatch accept overrides:

gh workflow run playfair.yml \
   -f epochs=100 \
   -f latency_storage_compute_ms=80 \
   -f skip_latency=false

Tuning & debugging

Watch the run progress

kubectl get pods -A -w
kubectl -n ecca-shared logs -f job/playfair-orchestrator
kubectl -n region-storage logs -f deploy/siyana-api

Iterate on a single service

Edit services/siyana-api/src/server.ts, then:

terraform apply -auto-approve \
  -var skip_latency=true \
  -var force_image_rebuild=$(date +%s)

Source-tree hashing means only the siyana-api image is rebuilt and re-imported; chains and infra stay up.

Common failures

Symptom	Cause	Fix
`ErrImageNeverPull: ecca-ts-builder:local`	Image not imported into k3d	It's listed in `main.tf`'s `all_image_refs`; if missing, add it and re-apply.
`PrismaClientInitializationError: ... openssl-1.1.x`	Alpine openssl version detection fails	`PRISMA_QUERY_ENGINE_LIBRARY` is hard-pinned in `03-services.yaml` to the musl/arm64 3.0.x engine.
`Error: duplicate subscription` in thalamus-router	Multiple regions sharing one NATS consumer name	Consumer names are region-scoped (`thalamus-mem-${region}`) — verify `ECCA_REGION` env is set per pod.
`k3d image import` hangs	Concurrent imports deadlock the k3d tools node	Imports are serialised in a single `null_resource.k3d_image_import` shell loop.
Empty `playfair-results.json`	Orchestrator pod completed before `kubectl cp` ran	Fallback in `run-orchestrator.sh` extracts JSON from the log marker `═══ RESULTS JSON ═══`.

PLAYFAIR

Contents

What is Playfair?

Why does it exist?

When does it run?

What it produces

What it represents

What the result means

How conditions vary

Real test runs (verified)

Per-agent activity (Run B)

Scenario events (deterministic, fired at fixed epochs)

Cross-region migrations (33 in Run B)

Allocation audit (every epoch)

Residue handling

Anatomy of a report

Run it locally

Run it in CI

Tuning & debugging

Watch the run progress

Iterate on a single service

Common failures