// project
Atlas trades 0-1 DTE SPY options with real capital. Decisions are made by a transparent level-based decision engine sitting on top of a six-state HMM regime detector and a CUSUM jump detector. The system closes every position by 3:55 PM. No overnight risk.
It runs continuously during market hours across eleven coordinated OS processes on local hardware, with dual-broker fallback (Alpaca primary, Interactive Brokers secondary). It is my primary income.
The system is live and trading every market day. The current focus is the disciplined reintroduction of machine learning, a LightGBM classifier trained on a labeled corpus of decision outcomes from the level engine itself, blending into the engine’s confidence under shadow-mode-and-ramp gates with per-regime calibration validators. The blend weight is climbing on a schedule with auto-pause on validator regression. The full story of the pivot and the reintroduction is here: on disabling ML in production.
Event-driven, message-passing between isolated services. Each component does one thing and publishes its output to Redis Streams; the next component reads, processes, publishes. State and historical context live in TimescaleDB on top of PostgreSQL 17. Process lifecycle is managed by launchd, with a 5-second heartbeat and 15-second TTL surfacing degraded services to a system-health aggregator before they touch trading.
The pipeline:
Failure of any single service degrades gracefully; the others keep operating on stale-but-known data until the failed component recovers.
The signal generator is intentionally transparent. Levels are aggregated from multiple sources, scored under a regime-aware confidence model, and screened by a set of entry guards before any order touches the execution layer. Every decision is reconstructable in a debugger or in a notebook; if a trade is wrong, I can read the inputs and the rule that fired and tell you why.
The engine replaced a black-box XGBoost and TensorFlow LSTM ensemble in November 2025 after I caught a silent feature-mismatch degradation in the ensemble’s inputs. The replacement is documented in the disabling ML in production writeup. The short version is that transparency is a load-bearing engineering property in production, not a soft preference.
The system is currently bringing machine learning back, deliberately and under guards.
A LightGBM classifier is trained on a labeled corpus of decision outcomes generated by the level engine itself. It is not a free-running signal source. It is an opinion that gets blended into the engine’s confidence at a weight I set, and that I can roll back to zero with a config change in five seconds.
The blend ramp is gated end to end:
This is not a fashion statement. It is the deployment discipline that distinguishes production machine learning from research machine learning glued behind an API.
The interesting part isn’t the trading logic. It’s the risk layer that says no.
Trading without these controls is gambling. Trading with them is engineering.
Three things that took most of the time, none of them ML:
1. Feature parity between training and historical replay and live. The features computed in any of three paths have to be bit-identical. Any divergence and downstream models are making decisions on inputs they were not validated on. This is mostly discipline: same code path, same input format, no shortcuts. It is also why the November 2025 pivot happened, when feature-parity drift went undetected in the live ensemble for weeks.
2. Broker reliability under load. Live feeds drop. Rate limits hit. Replacements arrive out of order. Reconciliation has to handle fill delays, network blips, and the occasional clock skew without losing track of position state. The reconciliation service polls broker state on a schedule, reconciles against the system’s internal position ledger, and flags divergences before they compound.
3. Observability that catches silent failures. Dashboards are necessary but not sufficient. The system has to call out its own degradations: a FallbackTracker with severity-based alerting (CRITICAL triggers after a single instance) surfaces all 110+ fallback paths across the codebase, so the system tells me when it has quietly stopped doing the thing I thought it was doing.
504 test files, 745+ tests, run on every change. The level decision engine alone has 376 dedicated unit tests. The system has been refactored under test the entire time: the paper-trader module went from 14,509 lines to 5,200 via Protocol-based context delegation into seven specialized microservices, with the test suite catching regressions across the refactor.
The system has been running long enough that “it works” is no longer the interesting question. The interesting question is what to add next, and the answer is almost always a constraint, not a feature.