How three independent blockchains, epoch-gated memory, and portable sleeves let you test different AI agents fairly — without requiring synchronous execution, identical hardware, or a centralized coordinator — using spot compute, as and when it's available, across different regions.
Most agent testing frameworks make a quiet assumption: every agent experiences the world at the same speed. You spin up a simulation, all agents tick at the same rate, they all see the same state at the same moment, and the test is "fair" because everything is synchronous.
This works on a single machine. It falls apart the moment you want to do any of the following:
us-east-1 versus a Raspberry Pi on someone's desk in BerlinThe synchronous assumption also creates a centralization problem: someone has to run the coordinator. Whoever runs the coordinator controls what "the world" looks like. Whoever controls the world controls the test results.
ECCA takes a different approach. The "world" isn't a single simulation — it's three independent blockchains running at their own speeds, with consistency proven cryptographically at epoch boundaries. Agents don't need to see the same state at the same moment. They need to provably interact with the same state within the same epoch.
In the human brain, different regions process information at radically different speeds. Your visual cortex processes frames at ~60Hz. Your prefrontal cortex deliberates at ~4Hz. Your memory consolidation happens at ~0.1Hz during sleep. Yet you experience a coherent world despite these different "bitrates."
ECCA models this with four sleeve kinds, each operating at a different tick rate:
Sleeve Kind Tick Rate Token Preference Analog
─────────────────────────────────────────────────────────────────
human 8s Memory ≫ Compute Slow, narrative cognition
ai 2s Compute ≫ Memory Fast inference (LLM)
mining event-driven Sync ≫ Routing PoW participation
memory every epoch Memory + Routing DAG pin maintenance
The critical insight: these agents don't need to tick at the same rate to be tested fairly. An AI sleeve ticking 4x faster than a human sleeve doesn't get an unfair advantage — it just consumes its ComputeToken budget faster. Both are bounded by the same per-epoch token allocation. Both reference the same epoch counter. Both produce events that get folded into the same coherence root.
This is what "variable bitrate" means in practice. The world doesn't run at a single frame rate. Each agent perceives it at its own speed, gated by its own token budget, with its own hardware constraints. The three chains provide the substrate — the ground truth — and the epoch system provides the synchronization boundary where everyone reconciles.
ECCA's world isn't a single database. It's three independent ledgers, each responsible for a different dimension of reality:
Each chain can run on completely different hardware, in different regions, at different speeds. Hippocampus doesn't wait for Medulla to mine a block before accepting writes. Cortex doesn't wait for Hippocampus to replicate before processing transactions. The chains are causally independent within an epoch.
Synchronization happens only at epoch boundaries, when the Thalamus router collects Merkle roots from each shard and submits a coherence tuple to Medulla for PoW finality. One proof-of-work commitment finalizes three independent substrates simultaneously.
The epoch is ECCA's fundamental unit of time. Default: 4 seconds. But here's the important part — it's a logical clock, not a wall clock.
// The tick loop in thalamus-router/src/server.ts
setInterval(async () => {
// 1. Collect event hashes buffered since last tick
const evmRootHex = evmHashes.length ? merkleRoot(evmHashes) : '00'.repeat(32);
const ipfsRootHex = ipfsHashes.length ? merkleRoot(ipfsHashes) : '00'.repeat(32);
const sleevesRoot = sleeveHashes.length ? merkleRoot(sleeveHashes) : '00'.repeat(32);
// 2. Compute cross-chain coherence root
const cross = coherenceRoot({ evm: evmRootHex, btc: '00'.repeat(32), ipfs: ipfsRootHex, sleeves: sleevesRoot });
// 3. Submit to Medulla for PoW finality
await medulla.submitCoherenceRoot({ crossRoot: cross, evmRoot: evmRootHex, ipfsRoot: ipfsRootHex, sleevesRoot });
// 4. Bridge to Cortex via EpochAnchor.commitAnchor()
// ...
}, EPOCH_INTERVAL_MS);
The epoch doesn't advance because 4 seconds passed. It advances because Medulla mined a PoW block containing the coherence tuple. If Medulla is slow (weak hardware, high difficulty), epochs take longer. If it's fast, they're shorter. The wall clock is advisory, not authoritative.
This means:
Every sleeve maintains a drift counter — it increments on perceive and decrements on sync. If an agent falls behind (its hardware is slow, its network is laggy, its spot instance got preempted), drift grows:
drift = 0 → in sync
drift ≤ DRIFT_MAX → warning, sleeve.drift event published
drift > 2×DRIFT_MAX → sleeve.desync → coordination residue created
But drift isn't failure — it's information. A desync creates a coordination residue: a bounty that any other agent can claim by providing a proof of the correct state. The system doesn't halt. It economically incentivizes repair.
This is the mechanism that makes asynchronous testing possible. At the end of each epoch, the Thalamus router computes:
crossRoot = sha256( "ecca-coh-v1" ‖ evmRoot ‖ btcRoot ‖ ipfsRoot ‖ sleevesRoot )
where:
evmRoot = merkleRoot([ txHash for each ECCA contract tx this epoch ])
btcRoot = reserved (32 zero bytes in v3)
ipfsRoot = merkleRoot([ sha256(cid) for each hippocampus write this epoch ])
sleevesRoot = merkleRoot([ sha256(type ‖ id) for each sleeve event this epoch ])
This single 32-byte hash commits to everything that happened across all three chains in that epoch. It gets mined into a Medulla PoW block, appended to the Synaptic Field MMR, and bridged to the Cortex EVM via the EpochAnchor contract.
A test verifier doesn't need to replay the entire epoch. They need only:
(crossRoot, evmRoot, ipfsRoot, sleevesRoot, synapticFieldRoot, medullaHeight)The EpochAnchor contract provides on-chain verification:
// Anyone can call this — it's a public, trustless verification primitive
function verifyShardInclusion(
uint256 epoch,
uint8 shard, // 0 = evm, 1 = ipfs, 2 = sleeves
bytes32 leaf,
bytes32[] calldata siblings,
uint256 indexBits
) external view returns (bool)
This means a test run in Singapore can be verified by a machine in Frankfurt that was never online at the same time. The proof is self-contained. The verification is deterministic. The on-chain contract is the final arbiter.
routing-equivocation residue and automatically slashes the offending operator. You cannot run a test with one version of the world for agent A and a different version for agent B. The coherence root is the single source of truth.
A sleeve is a containerized process bound to a Stack (cryptographic identity) by a per-epoch capability key. The sleeve-runtime is deliberately hardware-agnostic:
// sleeve-runtime/src/server.ts — the parametric loop
const SLEEVE_KIND = process.env.SLEEVE_KIND || 'human'; // human | ai | mining | memory
// AI sleeves optionally use Ollama for inference — but fall back to canned prompts
if (SLEEVE_KIND === 'ai' && LLM_PROVIDER === 'ollama') {
// Call local Ollama API on whatever GPU is available
} else if (SLEEVE_KIND === 'ai') {
// Canned prompt — runs on CPU, no GPU needed
}
// Human sleeves generate narrative perceptions at 8s intervals
// Mining sleeves join the medulla PoW pool
// Memory sleeves run DAG pin maintenance and reconciliation
Sleeves never own memory. They hold per-epoch capability leases. When a sleeve is decommissioned — because the spot instance was reclaimed, or the hardware died, or the test segment finished — the Stack's identity persists. Its episodic head, its token balances, its CPV coefficients: all survive.
A new sleeve can be spawned on completely different hardware, in a different region, on a different cloud provider, and it picks up exactly where the previous one left off. This is architectural re-sleeving: the embodiment is temporary, the identity is permanent.
g5.xlarge spot instance in us-east-1 for 47 minutes. The instance gets preempted. Agent A's sleeve is decommissioned (drift counter preserved, pinned shards intact). Twelve minutes later, a g4dn.xlarge becomes available in eu-west-1. A new sleeve is spawned, bound to Agent A's Stack, and resumes from the last synced epoch. The test continues. No data is lost. No state is corrupted. The agent's identity — its NFT, its token balances, its memory graph — didn't move. Only the sleeve did.
The three-chain architecture maps naturally onto distributed infrastructure:
Each region runs its own stack of services. The chains replicate across regions via their native protocols (Medulla propagates blocks via P2P, Hippocampus replicates via peer sync, Cortex uses geth's devp2p). NATS JetStream provides the intra-service event bus with 7-day retention.
Agent A in Virginia and Agent C in Ireland are both perceiving the same world. Not because they share a database, but because:
If Agent A's hardware is 10x faster than Agent C's, Agent A can perceive more events per epoch — but it burns through its ComputeToken budget faster. When the budget runs out, it waits. The epoch binding curve ensures tokens can't be hoarded across epochs (exponential decay with a 0.25 floor). The CPV coefficients ensure each agent's specialization is reflected in its token allocation.
The test is fair not because the hardware is identical, but because the resource economy is identical.
The TripartiteGame contract is the on-chain referee for multi-agent resource allocation. It models three resources — Compute, Storage, and Bandwidth — as a cooperative game with per-epoch budgets:
// contracts/src/TripartiteGame.sol
// 1. Open a game (only the referee/owner)
function openGame(bytes32 gameId) external onlyOwner;
// 2. Each agent registers with per-epoch budgets
function registerParty(
bytes32 gameId,
uint256 tokenId, // StackIdentity NFT
string label, // "agent-A", "agent-B"
uint256 computeBudget, // max compute per epoch
uint256 storageBudget, // max storage per epoch
uint256 bandwidthBudget // max bandwidth per epoch
) external;
// 3. All spending goes through consume() — atomic, capped, auditable
function consume(
bytes32 gameId, uint256 tokenId, uint256 epoch,
Resource resource, uint256 amount, string reason
) external;
// 4. Anyone can verify fairness — public, trustless, re-derivable
function verifyAllocationFair(bytes32 gameId, uint256 epoch)
external view returns (bool);
// 5. Emit on-chain audit events
function auditEpoch(bytes32 gameId, uint256 epoch) external;
Every consume() call atomically burns the underlying BandwidthToken and checks the per-epoch cap. There's no way to spend more than your budget. There's no way to hide spending — it's on-chain. And there's no way to dispute the audit — verifyAllocationFair() is a pure function over on-chain state.
verifyAllocationFair(gameId, epoch) for every epoch and confirm that no agent exceeded its resource budget. The benchmark is fair because the referee is a smart contract, not a process running on someone's laptop.
When a spot instance is about to be preempted, or when compute is cheaper in another region, or when an agent needs to be closer to a specific hippocampus node for latency — you needlecast.
Needlecasting is the atomic transfer of a sleeve's executive control from one host to another. It's a six-step saga with full rollback:
Step Operation What Happens Rollback
──────────────────────────────────────────────────────────────────────────────────────
1 freeze(source) Mark source sleeve as dead unfreeze
2 shard(episodicHead, depth=8) Collect CIDs via DAG walk (read-only)
3 pin(shards) Durability bond in hippocampus unpin
4 anchor(saga) Emit needlecast.route for thalamus drop fold
5 reconstruct(target) Spawn new sleeve, drift=0, sync epoch restore
6 settle(source) Debit RoutingToken (cost ≥ 5) re-credit
The cost model is:
needlecast_cost = 5 + 0.1 × shard_count + 0.5 × |sourceEpoch − targetEpoch|
This creates an economic incentive to needlecast to nearby epochs (low cost) and a penalty for large time jumps (high cost). The target pays nothing — what ECCA calls the "refugee-of-experience principle": re-sleeving is always inbound-free.
Consider a 24-hour agent benchmark:
ap-northeast-1 (Tokyo). Run AI sleeves there.eu-west-1 (Ireland) is cheaper. Needlecast all AI sleeves to Ireland. Cost: 5 + 0.1×shards + ~0 epoch drift. State is preserved. Test continues seamlessly.us-east-1.The agent doesn't know or care that it moved. Its Stack identity, token balances, and memory graph are the same. The sleeve is just a container. The coherence root proves that the agent's events were included in the correct epochs regardless of which region hosted them.
Distributed execution raises an obvious concern: how do you know every node is running the same code?
ECCA addresses this at multiple levels:
Every memory fragment in the hippocampus DAG has a CID: ecca://<sha256(canonical_json)>@<epoch>. The CID is a hash of the content. If the content differs, the CID differs. If the CID matches, the content is identical. This is enforced by the DAG node's Put() operation — it computes the CID from the content, not from a user-supplied value.
Every event's hash is included in the epoch's shard-specific Merkle root. The coherence root is a hash of all four shard roots. The coherence root is mined into a PoW block. If any node produces a different event for the same input, the hash changes, the Merkle root changes, the coherence root changes, and the PoW block is different. A divergence between nodes is cryptographically detectable.
The EpochAnchor.verifyShardInclusion() function lets anyone prove that a specific event was included in a specific epoch's shard root. The proof is a Merkle path — a sequence of sibling hashes. The verification is deterministic: given the same leaf, siblings, and root, the result is always the same. No code differences can hide behind this verification.
If a node does produce different results — different recall fidelity, different shard contents, different event hashes — the residue system catches it. A historical-non-canonical residue fires when recall fidelity drops below FIDELITY_MIN_DEFAULT. A speculative-divergence residue fires when drift exceeds 2×DRIFT_MAX. A reorg-orphan residue fires when Medulla reorgs invalidate an epoch's anchor.
Each residue carries a bounty. Any participant who provides a proof of the correct state earns a ResidueToken — the only token that doesn't decay. The economic incentive is always toward consistency, never toward hiding divergence.
ECCA v3 is functional. All packages build. All 275 tests pass. The three Go chain forks compile. The contracts deploy. The E2E test runs the full coherence cycle. But there is a concrete gap between "it works on localhost" and "fair multi-region agent benchmarking on spot compute."
| Capability | Status | What Exists | What's Missing |
|---|---|---|---|
| Epoch clock | ✅ Done | Thalamus router ticks every EPOCH_INTERVAL_MS, submits coherence roots to Medulla, bridges to Cortex via EpochAnchor |
— |
| Coherence root computation | ✅ Done | coherenceRoot(), merkleRoot(), per-shard Merkle trees, SynapticFieldMMR |
— |
| On-chain verification | ✅ Done | EpochAnchor.commitAnchor(), verifyContinuity(), verifyShardInclusion() |
— |
| TripartiteGame | ✅ Done | openGame, registerParty, consume, verifyAllocationFair, auditEpoch |
— |
| Sleeve portability | ✅ Done | 4 sleeve kinds, parametric runtime, hardware-agnostic containers | — |
| Token economy | ✅ Done | 5 tokens, CPV, EBC, effectiveBalance(), per-epoch decay |
— |
| Needlecasting saga | ✅ Done | 6-step saga with rollback, cost model, freeze/reconstruct/settle | — |
| Residue system | ✅ Done | 5 residue kinds, detection, proof submission, bounty payout | — |
| Multi-region chain replication | ⚠ Partial | Docker Compose local, Swarm distributed config, Helm chart stubs | Chain P2P peering across regions, NAT traversal, peer discovery for Medulla and Hippocampus. Cortex uses geth devp2p which handles this natively. |
| Spot instance lifecycle | 🔮 TODO | Sleeve decommission on SIGTERM preserves state | Spot interruption handler that triggers needlecast before termination. AWS/GCP spot signal → freeze → needlecast → settle. Cloud-specific lifecycle hooks. |
| Cross-region needlecasting | ⚠ Partial | Saga logic exists end-to-end, NATS carries events | Hippocampus shard replication across regions (currently in-memory, needs cross-region peer sync for pin transfers). Target region must have the shards before reconstruct(). |
| Cortex EVM precompiles | 🔮 TODO | Standard geth with Clique PoA | isCoherent(epoch, root) and verifyMerkleShard(root, leaf, proof) as native EVM precompiles. Currently done in Solidity (works but costs more gas). |
| Benchmark harness | 🔮 TODO | E2E test, unit tests, TripartiteGame contract | Orchestrator that opens a TripartiteGame, registers N agents across M regions, runs for K epochs, collects per-epoch audit results, produces a benchmark report. The plumbing exists; the harness doesn't. |
| Helm charts | ⚠ Partial | chart-chains has templates, values-shared.yaml exists |
Complete charts for chart-data, chart-orchestration, chart-sleeves, chart-workers, chart-observability. Needed for K8s multi-region deployment. |
| Observability | ✅ Done | Prometheus, Loki, Grafana provisioning, Jaeger tracing | Per-agent dashboards, drift tracking, per-epoch resource utilization graphs. Config exists but dashboards are generic. |
The gap between "works on localhost" and "fair multi-region benchmarking on spot compute" is concrete and measurable. Here's the work, in dependency order:
Medulla and Hippocampus currently run as single-instance processes. To run across regions, they need P2P peer discovery and block/node propagation:
chain.go code handles reorgs; it just needs a network transport.peers map and AddPeer()/RemovePeer() methods are stubbed out. Implement push-based replication: when a node calls Put(), it pushes the node to all peers. Pin leases replicate with the nodes.When a cloud provider sends a spot interruption signal (AWS gives 2 minutes, GCP gives 30 seconds), the sleeve needs to:
wireShutdown())The existing wireShutdown() in @ecca/service-base already registers SIGTERM handlers. The needlecasting saga already exists. The missing piece is: detect spot interruption signal → initiate needlecast to a target region → handle the race condition where termination arrives before the saga completes.
The needlecasting saga's step 3 (pin(shards)) assumes the target hippocampus node already has the shards. For cross-region needlecasting, the shards need to be replicated to the target region before reconstruct().
Option 1: Eager replication — all nodes replicate everywhere (simple, expensive). Option 2: Lazy replication with on-demand fetch — the target region pulls missing shards during reconstruct() (complex, efficient). Option 3: Hybrid — pin leases trigger replication to a configurable set of regions (practical middle ground).
A CLI tool that:
TripartiteGame with configurable budgetsverifyAllocationFair() results from the contractEvery piece of this exists except the orchestration script itself. The contracts are deployed, the sleeve-runtime is parametric, the metrics are exposed via Prometheus, the audit functions are on-chain.
Complete the Kubernetes deployment charts so the entire stack can be deployed across multiple regions with helm install. The chart-chains templates exist as a reference; the remaining five charts need Deployment/Service/ConfigMap manifests that mirror the Docker Compose configuration.
Update (9 May 2026): The orchestration gap described above has been partially closed. See the next section.
The day after publishing the analysis above, we built Playfair — a complete Kubernetes test harness that implements the variable-bitrate thesis in actual multi-node infrastructure. It's named after the Playfair cipher, because the whole point is that fairness is verifiable after the fact, not enforced during execution.
Playfair provisions a k3d cluster with 3 labeled agent nodes, each simulating a region with a different cost profile:
ECCA_DIFFICULTY=6, making PoW mining slow and expensive. Natural home for archivists and memory-keepers.ECCA_DIFFICULTY=3, making mining fast. Hippocampus is throttled. Natural home for inference agents.Each region deploys its own full set of three chains plus siyana-api and thalamus-router instances pointing at the local chains, with shared Postgres, Redis, NATS, and MinIO in a fourth namespace.
Playfair deploys 6 agents — 2 per region — with specialized behavioral profiles:
This is where it gets interesting. At scripted epochs, the orchestrator triggers events that force the system to demonstrate its resilience properties:
Every epoch, the orchestrator runs a fairness audit. Every needlecast is costed (5 + 0.1×shards + 0.5×drift). Every token burn is tracked. At the end, it outputs a comprehensive JSON with per-agent-per-epoch metrics, which the report generator renders into a cyberpunk HTML report.
Playfair isn't a simulation. It runs actual chain nodes, actual K8s resource limits, actual cross-namespace networking. When region-compute's Medulla mines a block at difficulty 3, it actually mines faster than region-storage's Medulla at difficulty 6. When Inference-Prime needlecasts from compute to storage, its shard data actually moves across K8s services.
The thesis from this blog post — that you can test agents fairly across heterogeneous hardware without synchronous execution — is now testable with one command:
pnpm test:playfair --epochs 20
The fundamental bet of ECCA is that you don't need synchronous execution to have fair testing. You need cryptographic proof of consistent state at well-defined boundaries. The epoch is the boundary. The coherence root is the proof. The TripartiteGame is the referee. The sleeve is the portable execution unit. The token economy is the resource constraint.
An agent running on a $0.30/hr spot GPU in Singapore and an agent running on a $2.50/hr reserved instance in Virginia can both participate in the same benchmark. Neither needs to trust the other. Neither needs to trust a central coordinator. The smart contract verifies fairness. The coherence root verifies consistency. The residue system economically incentivizes repair of any divergence.
The world isn't synchronized. It's coherent. That's the difference.
All test results are available in the unit test report (275 tests across 6 suites), the E2E report (full coherence cycle), and the Playfair report (3-region tripartite game). The contracts, including
TripartiteGameandEpochAnchor, have 135 Solidity tests covering all verification primitives. Source: github.com/aarong11/dhf.