#
Engine Persistence and Disk Usage
#
Overview
The Engine persists its state through two mechanisms:
- Command logs: Sequential records of all received commands
- Snapshots: Point-in-time captures of complete engine state
#
Command Logs
#
Structure
- Every command received by the Engine is written to command-log files, in order
- Minimum activity: One clock-tick every 10ms
- Each order consumes approximately 500 bytes
#
Daily Disk Usage Scenarios
#
Snapshots
#
Characteristics
- Contains complete engine state
- Generated every ~8 minutes by the Snapshotter process
- Each snapshot is indexed to a specific command in the command log
- Minimum size: 1.5 MB per snapshot
- Approximately 300 MB per day for minimal usage
#
Size Factors
Snapshot size varies based on:
- Number of instruments
- Transaction volume
- Active Orders
- Price tick frequency
- Overall system activity
Size range: 1.5 MB to several hundred MB per snapshot
#
State Recovery Process
flowchart TD A[Start Engine] --> B{Snapshot Available?} B -->|Yes| C[Load Most Recent Snapshot] B -->|No| D[Find commandLog.0] C --> E[Find Commands Since Snapshot] D --> F[Process All Command Files] E --> G[Replay Commands] F --> G G --> H[Engine Ready] style A fill:#f9f,stroke:#333 style H fill:#9f9,stroke:#333
#
File Management
#
Archival Process
The Snapshotter automatically:
- Moves older command logs to archive directory
- Moves older snapshots to archive directory
- Never modifies archived files
#
Recommended Management Practices
File System Management:
- Regularly compress archived files
- Transfer archives to secondary storage
- Remove intermediate snapshots as needed
- Maintain a minimum snapshot frequency (e.g., daily)
Storage Planning:
- Monitor growth rate of command logs
- Track snapshot size trends
- Ensure adequate primary storage capacity
- Implement archive retention policy
#
Technical Considerations
#
Disk Space Planning
- Active systems may generate multiple GB daily
- Consider both command logs and snapshots in capacity planning
- Monitor growth rate during peak trading periods
#
Performance Impact
- Startup time depends on:
- Size of latest snapshot
- Number of commands since snapshot
- Storage system performance
#
Recovery Capabilities
- Any snapshot provides full system state
- Intermediate snapshots are safe to delete
- Maintain at least one recent snapshot at all times
#
Persistence Directory Reference
#
File Format
All persistence files use a proprietary binary format which is subject to change.
#
Directory Structure
#
Command Logs
commandLog.<index>
<index>
: Integer representing the first command index in the file- Files are sequentially numbered
- New file created with each snapshot generation
#
Snapshot Directories
snapshot.<index>/
├── complete.meta
└── *.meta
<index>
: Command index at which snapshot was taken- Contains complete system state at specified index
#
File Specifications
#
Index Relationships
For a snapshot at index n
:
- Snapshot directory:
snapshot.n/
- Next command log:
commandLog.(n+1)
- Previous command log:
commandLog.m
wherem
is the index of the previous snapshot + 1
#
Example Directory Layout
persistence/
├── commandLog.0
├── snapshot.287209819/
│ ├── complete.meta
│ ├── balances.meta
│ ├── balances
│ ├── orders.meta
│ ├── orders
│ ├── positions.meta
│ ├── positions
│ └── ...
├── commandLog.287209820
#
Multiple Instance Protection and Recovery
#
Command File Locking
The Engine implements strict single-writer semantics for the persistence directory:
- Only one Engine instance can write to the persistence directory at a time
- Each command file contains metadata including:
- Completion status flag
- Process ID (PID) of the writing process
- Multiple concurrent writers would corrupt the command log sequence
#
Startup Safety Checks
On startup, the Engine performs these verification steps:
- Checks the most recent command file's completion status
- Verifies if another process is actively writing to the file
- Takes action based on configuration:
- Default behavior: Exits immediately if incomplete command file detected
- Configurable wait period: Can monitor for abandoned files
#
Abandonment Detection
The Engine can be configured to handle abandoned command files:
ABANDONMENT_TIMEOUT_MS=<milliseconds>
When configured, the startup process:
- Monitors the most recent command file for changes
- If file size is changing:
- Waits indefinitely
- Assumes active writer exists
- If file size remains static:
- Starts abandonment timer
- Takes over after ABANDONMENT_TIMEOUT_MS with no changes
- If file marked complete during wait:
- Immediately proceeds with normal startup
An abandoned command file is defined as:
- Not marked as complete
- No active writing process (PID no longer exists)
- Only relevant for the most recent command file
- Earlier incomplete files assumed complete if later files exist
#
Deployment Best Practices
For clustered environments:
- Configure cluster manager to enforce single-instance constraint
- For Kubernetes:
- Set
maxReplicas: 1
- Set
minReplicas: 0
- Set
#
Recovery Scenarios
#
Validation Rules
Command Logs
- Must be sequentially numbered
- Each contain a sequential log of commands
- Index of the next command file must be
n+s
wheren
is the index of the file ands
is the number of commands in the file
Snapshots
- Must contain
complete.meta
- Missing
complete.meta
indicates invalid snapshot - Invalid snapshots are ignored and eligible for overwrite
- Must contain
Index Integrity
- Next command log index must be snapshot index + 1
- First command log (index 0)
- is created immediately after a clean engine state
- is the necessary starting point for a complete command replay "from the start of time"