Unveiling Speed

Inside the Nekuti Matching Engine

At Nekuti, innovation is foundational to how we approach performance. Our matching engine is designed for real-time, high-throughput environments. We use purpose-built data structures, and highly optimized routines to redefine what’s possible in speed and scalability.

Throughput—the ability to process millions of events per second without architectural strain—is a cornerstone of market resilience. Many matching engines rely on tree-maps: with their O(log n) insert and lookup times, they’re a reliable component from the parts bin of general-purpose ordered storage solutions. But under sustained load, rebalancing costs, pointer-chasing overhead, and cache inefficiencies begin to cap throughput. Nekuti’s structure was designed to sidestep these constraints—prioritizing predictable performance and unlocking system headroom for feature enablement.

Recognizing these limitations led us to design a solution from the ground up—tailored for efficiency at every level of the system. What follows is a look into the data structures and optimizations that power Nekuti’s competitive edge.

Tree-Map Limitations

Tree-maps are a go-to for ordered storage with logarithmic time complexity—but they fall short in performance-critical environments like matching engines:

Insertion Overhead: Balancing operations slow down performance as datasets grow.
Inefficient Memory Use: Pointer-heavy nodes inflate memory footprint.
Slow Traversals: Navigating hierarchical structures adds friction to time-sensitive queries.

While dependable in general-purpose systems, tree-maps simply don’t scale to the microsecond demands of real-time matching. That challenge pushed us to engineer a structure purpose-built for performance.

The Nekuti Advantage

Our matching engine leverages a custom-built order book structure, engineered specifically for high-throughput traversal and efficient memory usage—offering a significant performance edge over tree-map based implementations.

A matching engine is required to rapidly traverse order books to determine the next populated price level. Performance is critical. Order book levels are often sparsely populated in the wings and denser near the mid-price, meaning naïvely scanning through levels can introduce costly delays.

To address this, Nekuti uses a bitmap-based level index. Each bit in this array represents a specific price level and signals whether it’s occupied. Leveraging modern CPU instructions like TZCNT and LZCNT, the engine can locate the next active level extremely quickly, scanning 64 price levels per clock cycle with minimal branching or overhead.

We have also optimized how levels themselves are laid out: each order book level resides in a flat, contiguous array in memory. This layout significantly improves cache locality, enabling the CPU to preload data efficiently and avoid the penalties of pointer dereferencing and heap fragmentation. The result is faster traversal, lower memory overhead, and ultra-high throughput access patterns tuned precisely to the engine’s needs. By combining architectural-level CPU instructions with a finely tuned memory layout, the Nekuti Matching Engine achieves performance characteristics well beyond those possible with general purpose tree-map based approaches, tailored for the realities of modern markets.

Performance Comparison

To quantify the impact of our optimizations, we benchmarked two builds of the Nekuti Matching Engine: one using our optimized data structure, one using an off-the-shelf tree-map. No other code changes.

The results are striking: our optimized structure consistently delivers higher order throughput across a range of test scenarios.

On average, our optimizations allow the engine to process 60,000 additional orders per second when compared to the tree-map based approach, creating substantial headroom for high-volume trading environments.

But this headroom isn’t just about raw speed. It can be strategically traded for richer functionality, such as real-time margin risk management, or Nekuti’s unique Amplified Liquidity, all without compromising performance.

The chart above illustrates this performance delta, with the Nekuti optimized data structure maintaining consistently elevated throughput. This demonstrates raw speed and sustained efficiency—critical for modern markets where volume spikes are the norm, not the exception.

Conclusion

In designing the Nekuti Matching Engine, we moved beyond traditional data structures to develop a system purpose-built for performance. Through targeted memory layouts and thoughtful use of hardware features, we’ve achieved a solution that scales with the demands of modern trading, delivering speed, precision, and reliability where it matters most.