# Engine Persistence and Disk Usage

# Overview

The Engine persists its state through two mechanisms:

Command logs: Sequential records of all received commands
Snapshots: Point-in-time captures of complete engine state

# Command Logs

# Structure

Every command received by the Engine is written to command-log files, in order
Minimum activity: One clock-tick every 10ms
Each order consumes approximately 500 bytes

# Daily Disk Usage Scenarios

Scenario	Command Log Usage
Minimum Daily (No Orders)	~300 MB
+100,000 Orders	+50 MB
+1,000,000 Orders	+500 MB
+10,000,000 Orders	+5 GB
+100,000,000 Orders	+50 GB

# Snapshots

# Characteristics

Contains complete engine state
Generated every ~8 minutes by the Snapshotter process
Each snapshot is indexed to a specific command in the command log
Minimum size: 1.5 MB per snapshot
Approximately 300 MB per day for minimal usage

# Size Factors

Snapshot size varies based on:

Number of instruments
Transaction volume
Active Orders
Price tick frequency
Overall system activity

Size range: 1.5 MB to several GB per snapshot

# State Recovery Process

flowchart TD
A[Start Engine] --> B{Snapshot Available?}

    B -->|Yes| C[Load Most Recent Snapshot]
    B -->|No| D[Find commandLog.0]
    
    C --> E[Find Commands Since Snapshot]
    D --> F[Process All Command Files]
    
    E --> G[Replay Commands]
    F --> G
    
    G --> H[Engine Ready]
    
    style A fill:#f9f,stroke:#333
    style H fill:#9f9,stroke:#333

# File Management

# Archival Process

The Snapshotter automatically:

Moves older command logs to archive directory
Moves older snapshots to archive directory
Never modifies archived files

# Recommended Management Practices

File System Management:
- Regularly compress archived files
- Transfer archives to secondary storage
- Remove intermediate snapshots as needed
- Maintain a minimum snapshot frequency (e.g., daily)
Storage Planning:
- Monitor growth rate of command logs
- Track snapshot size trends
- Ensure adequate primary storage capacity
- Implement archive retention policy

# Technical Considerations

# Disk Space Planning

Active systems may generate multiple GB daily
Consider both command logs and snapshots in capacity planning
Monitor growth rate during peak trading periods

# Performance Impact

Startup time depends on:
- Size of latest snapshot
- Number of commands since snapshot
- Storage system performance

# Recovery Capabilities

Any snapshot provides full system state
Intermediate snapshots are safe to delete
Maintain at least one recent snapshot at all times

# Persistence Directory Reference

# File Format

All persistence files use a proprietary binary format which is subject to change.

# Directory Structure

# Command Logs

commandLog.<index>

<index>: Integer representing the first command index in the file
Files are sequentially numbered
New file created with each snapshot generation

# Snapshot Directories

snapshot.<index>/
├── complete.meta
└── *.meta

<index>: Command index at which snapshot was taken
Contains complete system state at specified index

# File Specifications

File Pattern	Format	Description
`commandLog.0`	Binary	Initial command log. Created only once after engine reset. Contains first command processed.
`commandLog.<n>`	Binary	Sequential command logs. Created after each snapshot. Contains commands starting from index `<n>`.
`snapshot.<n>/`	Directory	Snapshot directory for command index `<n>`.
`snapshot.<n>/complete.meta`	Binary	Snapshot validation marker. Must exist for snapshot to be considered valid.
`snapshot.<n>/*.meta`	Binary	Segment-specific metadata files describing snapshot contents.

# Index Relationships

For a snapshot at index n:

Snapshot directory: snapshot.n/
Next command log: commandLog.(n+1)
Previous command log: commandLog.m where m is the index of the previous snapshot + 1

# Example Directory Layout

persistence/
├── commandLog.0
├── snapshot.287209819/
│   ├── complete.meta
│   ├── balances.meta
│   ├── balances
│   ├── orders.meta
│   ├── orders
│   ├── positions.meta
│   ├── positions
│   └── ...
├── commandLog.287209820

# Multiple Instance Protection and Recovery

# Command File Locking

The Engine implements strict single-writer semantics for the persistence directory:

Only one Engine instance can write to the persistence directory at a time
Each command file contains metadata including:
- Completion status flag
- Process ID (PID) of the writing process
Multiple concurrent writers would corrupt the command log sequence

# Startup Safety Checks

On startup, the Engine performs these verification steps:

Checks the most recent command file's completion status
Verifies if another process is actively writing to the file
Takes action based on configuration:
- Default behavior: Exits immediately if incomplete command file detected
- Configurable wait period: Can monitor for abandoned files

# Abandonment Detection

The Engine can be configured to handle abandoned command files:

ABANDONMENT_TIMEOUT_MS=<milliseconds>

When configured, the startup process:

Monitors the most recent command file for changes
If file size is changing:
- Waits indefinitely
- Assumes active writer exists
If file size remains static:
- Starts abandonment timer
- Takes over after ABANDONMENT_TIMEOUT_MS with no changes
If file marked complete during wait:
- Immediately proceeds with normal startup

An abandoned command file is defined as:

Not marked as complete
No active writing process (PID no longer exists)
Only relevant for the most recent command file
Earlier incomplete files assumed complete if later files exist

# Deployment Best Practices

For clustered environments:

Configure cluster manager to enforce single-instance constraint
For Kubernetes:
- Set maxReplicas: 1
- Set minReplicas: 0

# Recovery Scenarios

Scenario	Default Behavior	With Timeout Configured
Clean shutdown	Proceeds normally	Proceeds normally
Crashed writer process	Exits with error	Waits for timeout, then takes over
Active writer process	Exits with error	Waits indefinitely
Incomplete old files	Assumes completed	Assumes completed

# Validation Rules

Command Logs
- Must be sequentially numbered
- Each contain a sequential log of commands
- Index of the next command file must be n+s where n is the index of the file and s is the number of commands in the file
Snapshots
- Must contain complete.meta
- Missing complete.meta indicates invalid snapshot
- Invalid snapshots are ignored and eligible for overwrite
Index Integrity
- Next command log index must be snapshot index + 1
- First command log (index 0)
  - is created immediately after a clean engine state
  - is the necessary starting point for a complete command replay "from the start of time"