// project

Atlas Trading System

September 2025

Technologies

PythonNumPyPandasAsyncioScikit-learnLightGBMTimescaleDBPostgreSQLRedisDockerInteractive BrokersAlpaca

Atlas trades 0-1 DTE SPY options with real capital. Decisions are made by a transparent level-based decision engine sitting on top of a six-state HMM regime detector and a CUSUM jump detector. The system closes every position by 3:55 PM. No overnight risk.

It runs continuously during market hours across eleven coordinated OS processes on local hardware, with dual-broker fallback (Alpaca primary, Interactive Brokers secondary). It is my primary income.

Currently

The system is live and trading every market day. The current focus is the disciplined reintroduction of machine learning, a LightGBM classifier trained on a labeled corpus of decision outcomes from the level engine itself, blending into the engine’s confidence under shadow-mode-and-ramp gates with per-regime calibration validators. The blend weight is climbing on a schedule with auto-pause on validator regression. The full story of the pivot and the reintroduction is here: on disabling ML in production.

Architecture

Event-driven, message-passing between isolated services. Each component does one thing and publishes its output to Redis Streams; the next component reads, processes, publishes. State and historical context live in TimescaleDB on top of PostgreSQL 17. Process lifecycle is managed by launchd, with a 5-second heartbeat and 15-second TTL surfacing degraded services to a system-health aggregator before they touch trading.

The pipeline:

Market data layer - real-time L2 order book, options chain via OPRA, and index quotes via redundant feeds from Interactive Brokers and Alpaca, with automatic failover and rate-limit-aware token buckets
Feature engineering - per-tick computation across multiple timeframes, with strict feature-parity discipline between historical and live paths
Regime detection - VIX1D-primary six-state HMM with a CUSUM binary jump detector layered on top
Decision engine - a transparent level-based engine that scores multi-source level confluence under regime-aware confidence weights
Disciplined ML overlay - a LightGBM classifier blending into the level engine’s confidence at a deliberately set, ramping weight, gated by per-regime calibration, log-loss-versus-naive, and hyperparameter stability validators
Risk layer - eleven entry guards, tiered signal health response, kill switch with dual-broker fallback, daily loss halt, hard 3:55 PM exit
Execution - order validation, market-hour gates, reconciliation against broker state, automated circuit breakers, emergency flatten

Failure of any single service degrades gracefully; the others keep operating on stale-but-known data until the failed component recovers.

Decision engine

The signal generator is intentionally transparent. Levels are aggregated from multiple sources, scored under a regime-aware confidence model, and screened by a set of entry guards before any order touches the execution layer. Every decision is reconstructable in a debugger or in a notebook; if a trade is wrong, I can read the inputs and the rule that fired and tell you why.

The engine replaced a black-box XGBoost and TensorFlow LSTM ensemble in November 2025 after I caught a silent feature-mismatch degradation in the ensemble’s inputs. The replacement is documented in the disabling ML in production writeup. The short version is that transparency is a load-bearing engineering property in production, not a soft preference.

Disciplined ML reintroduction

The system is currently bringing machine learning back, deliberately and under guards.

A LightGBM classifier is trained on a labeled corpus of decision outcomes generated by the level engine itself. It is not a free-running signal source. It is an opinion that gets blended into the engine’s confidence at a weight I set, and that I can roll back to zero with a config change in five seconds.

The blend ramp is gated end to end:

Shadow mode for weeks against live decisions before any blend weight is set above zero
Per-regime calibration: predicted probability matches realized outcome rate inside each regime
Log-loss versus naive baseline: the model has to beat “guess the prior” by a meaningful margin
Hyperparameter stability across cross-validation folds: the chosen settings are not artifacts of a single split
Auto-pause: any validator regression on rolling-window data pulls the blend weight back to zero without me in the loop

This is not a fashion statement. It is the deployment discipline that distinguishes production machine learning from research machine learning glued behind an API.

Risk management

The interesting part isn’t the trading logic. It’s the risk layer that says no.

Eleven entry guards screen every candidate trade: time window, data quality, risk limits, position count, directional concentration, entry slippage, trajectory alignment, day-type filter, degradation threshold, and more. A trade only happens if every guard passes.
Tiered signal health response (TIER_0 through TIER_4) responds to degraded inputs with graduated severity rather than an immediate flatten, which means transient feed blips do not turn into unforced losses.
Kill switch with dual-broker fallback sends flatten orders to Alpaca first, Interactive Brokers second, with verified reconciliation against broker state after.
Daily loss halt stops new orders if cumulative daily loss crosses a hard threshold.
Hard 3:55 PM close flattens every position before market close. No overnight 0DTE holds, ever.

Trading without these controls is gambling. Trading with them is engineering.

Production engineering

Three things that took most of the time, none of them ML:

1. Feature parity between training and historical replay and live. The features computed in any of three paths have to be bit-identical. Any divergence and downstream models are making decisions on inputs they were not validated on. This is mostly discipline: same code path, same input format, no shortcuts. It is also why the November 2025 pivot happened, when feature-parity drift went undetected in the live ensemble for weeks.

2. Broker reliability under load. Live feeds drop. Rate limits hit. Replacements arrive out of order. Reconciliation has to handle fill delays, network blips, and the occasional clock skew without losing track of position state. The reconciliation service polls broker state on a schedule, reconciles against the system’s internal position ledger, and flags divergences before they compound.

3. Observability that catches silent failures. Dashboards are necessary but not sufficient. The system has to call out its own degradations: a FallbackTracker with severity-based alerting (CRITICAL triggers after a single instance) surfaces all 110+ fallback paths across the codebase, so the system tells me when it has quietly stopped doing the thing I thought it was doing.

Testing

504 test files, 745+ tests, run on every change. The level decision engine alone has 376 dedicated unit tests. The system has been refactored under test the entire time: the paper-trader module went from 14,509 lines to 5,200 via Protocol-based context delegation into seven specialized microservices, with the test suite catching regressions across the refactor.

The numbers

Live since February 19, 2026, real capital, real P and L
Eleven coordinated OS processes managed by launchd
Dual-broker stack with verified reconciliation
Six-state HMM regime detection with CUSUM jump overlay
Eleven entry guards, tiered signal health, kill switch
504 test files, 745+ tests
Zero overnight positions, ever

The system has been running long enough that “it works” is no longer the interesting question. The interesting question is what to add next, and the answer is almost always a constraint, not a feature.