Testing Beyond Expectations

High‑assurance financial systems fail in the margins, not the centre. The rare, low‑probability, high‑impact scenarios are the ones that break production systems, especially when those systems must operate deterministically across blockchain environments. Nekuti’s testing strategy is built around this reality.

Before diving into the methodology, it is worth stating the baseline.
The Nekuti Matching Engine is developed under strict TDD and backed by more than 17,000 unit, acceptance, and integration tests.
That foundation matters, but it is only the starting point. The rest of this post focuses on the testing philosophy that goes far beyond traditional coverage.

Why Traditional Testing is Not Enough

Unit tests validate what is already known. They encode expectations and confirm predictable behaviour. But in a matching engine, the dangerous failures live in the unknown. These include numeric edge cases, pathological sequencing of events, precision loss in chained arithmetic, rare interactions between order types, volatility‑driven state transitions, and multi‑asset interactions. These are not scenarios that emerge from human imagination or specification documents. They emerge from scale.

This is especially true in blockchain deployments, where determinism must be absolute, illegal states such as negative balances cannot be tolerated, rounding errors can be catastrophic, and state transitions must be identical across all nodes. Correctness must hold indefinitely, and these constraints shape the testing strategy.

Continuous Fuzz Testing at Extreme Scale

To expose the unknown, Nekuti runs a continuous fuzz‑testing pipeline that executes around three billion valid scenarios per day. These are not random strings or malformed payloads. They are synthetically generated, domain‑valid scenarios that push the engine into rare and extreme states.

Over time, this has accumulated to more than 1.6 trillion executed fuzz scenarios. At this scale, the system is forced into combinations of states that would never be conceived manually. This is where the most valuable failures are found. The fuzzing system explores overflow and underflow boundaries, extreme volatility in price and size distributions, adversarial combinations of order types, pathological sequencing of events, precision‑sensitive arithmetic chains, and multi‑asset interactions. The objective is not to simulate normal market behaviour. The objective is to force bugs to manifest.

Invariants: What Must Never Break

Fuzzing is only useful if it is measured against hard rules that must hold true in every valid state of the system. These are expressed as invariants and asserted continuously during fuzz runs.

One critical invariant is simple to state but non‑negotiable in practice:

balances must never go negative

This rule is enforced not just in straightforward scenarios, but under the most complex conditions the engine can encounter, including multi‑asset liquidations, cascading liquidations across correlated products, and auto‑deleveraging events. When any fuzzed scenario violates this invariant, it is treated as a serious failure. The sequence is captured, replayed deterministically, and reduced to a minimal reproduction so the underlying bug can be fixed and permanently guarded against.

Deterministic Replay: Making Failures Actionable

Fuzzing at scale only has value if every discovered failure can be reproduced exactly. To support this, every fuzzed scenario is executed under deterministic conditions, and the full sequence of inputs is captured whenever an invariant is violated.

This allows the engineering team to:

replay the exact failing scenario byte for byte
inspect the full state transition history
reduce the scenario to a minimal reproduction
add a deterministic test that prevents the issue from ever returning

Without deterministic replay, large‑scale fuzzing would generate noise. With it, every failure becomes a precise, debuggable signal.

Historical Determinism: Byte‑for‑Byte Behaviour Comparison

Beyond fuzzing and invariant checks, every build of the matching engine is validated against the complete historical record of real deployments. For each release, the engine is replayed against all known production, test, and development event streams, and its outputs are compared byte for byte with previously verified results.

This process ensures that:

behavioural changes are detected immediately
regressions surface even in long‑horizon or rare historical sequences
expected behaviour changes are explicitly acknowledged
unexpected behaviour changes are trapped before they reach production

This gives the engineering team a high‑confidence signal: if a behaviour changes, it is because the change was intentional, never accidental.

A Useful Mental Model: The Galton Board

A Galton Board produces a distribution where most balls land in the centre and the tails represent rare paths. In software, each ball is a code path. Traditional testing exercises the centre. Fuzz testing deliberately fattens the tails.

Image credit: Matemateca (IME/USP) / Rodrigo Tetsuo Argenton

The goal is to run the code paths that almost never occur, explore improbable combinations, reach deep and rarely executed branches, and expose state transitions that only appear once in a billion runs. This is where real‑world failures hide.

Why Pure Randomness Fails

Pure randomness is ineffective in structured financial systems. Most random inputs are invalid and get rejected immediately. This is equivalent to firing balls at the Galton Board from across the room and hoping one lands in the tiny hole at the top. Nothing meaningful is exercised. To be effective, randomness must be shaped.

Focused Randomness: Domain‑Aware Fuzzing

Nekuti uses focused randomness, which means the randomness is constrained to valid, realistic financial scenarios. Volatility and variation are amplified far beyond real‑world levels, and scenario distributions are shaped to match each client’s product set. Rare but meaningful combinations are prioritised over obscure noise. This methodology is referred to internally as Fuzz Focus. It is not chaos. It is targeted exploration of the state space.

Summary: Why this Matters for Nekuti

The Nekuti Matching Engine is designed for environments where correctness must be absolute, precision must be perfect, and determinism must hold indefinitely. That design philosophy is supported by:

strict TDD
17,000+ deterministic tests
continuous fuzzing at roughly three billion scenarios per day
1.6 trillion cumulative fuzz executions
domain‑aware scenario shaping
adversarial exploration of rare state transitions

This combination produces a matching engine that has been tested not just for the expected, but for the extreme. The failures that matter are the ones that appear once in a billion runs,
and those are the failures this testing strategy is built to find.