# Fault Injection

# Overview

The Fault Injection endpoint is a testing utility that allows operators to simulate various failure scenarios in test and integration environments. This feature enables stress-testing of recovery mechanisms, persistence behaviors, and system resilience without waiting for actual failures to occur.

# Enabling Fault Injection

# Environment Variable Configuration

Fault injection must be explicitly enabled at startup by setting the environment variable on the Engine and Control Gateways:

ENABLE_FAULT_INJECTION=yes

# Verification

When fault injection is disabled (default), the endpoint returns:

{
  "value": "FeatureNotSupported"
}

When fault injection is enabled, successful injection returns:

{
  "value": "FaultInjected"
}

Note: This response is returned even when injecting None fault types or locations, confirming the feature is active.

# Reference

# API Endpoint

PUT /TestingControls/InjectFault

Parameters:

  • fault (required): Type of fault to inject
  • faultLocation (required): Where in the system to inject the fault
  • parameter (optional): Numeric parameter for certain fault types (e.g., delay duration in milliseconds)

Example Request:

curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=Delay&faultLocation=MainLoopBeforeResponse&parameter=10000' \
  -H 'accept: application/json'

# Fault Types

Fault Type Description Parameter
None Logs an error message but performs no actual fault injection Not used
ThrowException Throws an exception at the specified location Not used
StackOverflow Triggers a stack overflow at the specified location Not used
OutOfMemory Exhausts available memory at the specified location Not used
Delay Sleeps for the specified duration Milliseconds (long)
GcPressure Allocates memory leaving approximately the specified amount free Bytes (long, minimum ~8MB)

# GcPressure Details

The GcPressure fault allocates memory to create garbage collection pressure:

  • Minimum free memory: ~8MB (enforced automatically)
  • If no parameter provided or value < 8MB: defaults to 8MB free
  • Re-injecting with different values: frees previous allocation and allocates to new level
  • Useful for testing low-memory scenarios and GC behavior

Example:

# Leave only 8MB free (minimum)
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=GcPressure&faultLocation=EndPointRequest'

# Leave 100MB free
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=GcPressure&faultLocation=EndPointRequest&parameter=104857600'

# Fault Locations

Location Description Injection Point
None No-op location; returns success without injecting fault N/A
EndPointRequest Gateway process before command submission Before engine receives command
EndPointResponse Gateway process after successful engine response After engine processes command
MainLoopBeforeResponse Engine's main loop during command processing Before response sent to gateway
MainLoopAfterResponse Engine's main loop during command processing After response sent to gateway

# Location Behavior Notes

  • Faults are only injected during live command processing
  • Faults are not injected when:
    • Commands are loaded from command files during recovery
    • Commands are processed by the snapshotter
  • This ensures fault injection tests live system behavior without corrupting persistence or permanently disabling the instance

# Explanation

# Examples

# Testing Crash Recovery

# Simulate engine crash after processing, but before starting the next command
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=ThrowException&faultLocation=MainLoopAfterResponse'

# Testing Gateway Resilience

# Simulate gateway delay before command submission
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=Delay&faultLocation=EndPointRequest&parameter=30000'

# Testing Memory Pressure

# Create severe memory pressure
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=GcPressure&faultLocation=MainLoopBeforeResponse&parameter=10485760'

# Heap Dumps

Fatal exits triggered by fault injection may produce heap dump files if a heap dump volume is configured:

Characteristics:

  • Heap dumps can be very large (multiple gigabytes)
  • Generated automatically on fatal errors
  • Require periodic cleanup in test environments

# Best Practices

  1. Isolation: Only enable in dedicated test environments
  2. Documentation: Log which faults are injected during test runs
  3. Monitoring: Watch for heap dumps and clean up regularly