ADR-005: Dual State Store Architecture (etcd + MongoDB)¶
| Attribute | Value |
|---|---|
| Status | Accepted |
| Date | 2026-01-16 |
| Deciders | Architecture Team |
| Related ADRs | ADR-001, ADR-002 |
Context¶
The system requires:
- Reactive state propagation: Scheduler and Controller need real-time notification of state changes
- Document storage: Complex aggregate structures (LabletDefinition, Worker Templates)
- High availability: No single point of failure for state storage
Options considered:
- MongoDB only - Use MongoDB change streams for reactivity
- etcd only - Store all state in etcd (key-value)
- Dual store - etcd for state/coordination, MongoDB for documents
- PostgreSQL + LISTEN/NOTIFY - Relational DB with notification
Decision¶
Use dual store architecture: etcd for state coordination + MongoDB for document storage.
| Store | Purpose | Data |
|---|---|---|
| etcd | State coordination, watching | Instance state, Worker state, Port allocations |
| MongoDB | Document storage, specs | LabletDefinitions, Worker Templates, Audit logs |
Rationale¶
Why etcd?¶
- Native watch: Built-in watch mechanism with guaranteed delivery
- Strong consistency: Linearizable reads/writes
- Leader election: Built-in primitives for Scheduler HA
- Kubernetes proven: Battle-tested at scale
Why MongoDB?¶
- Document model: Natural fit for complex aggregates (LabletDefinition schema)
- Rich queries: Filtering, aggregation for analytics
- Existing integration: Neuroglia MotorRepository already implemented
- Schema flexibility: Evolving document structures
Why not MongoDB alone?¶
- Change streams have limitations (cursor timeout, resumption complexity)
- No built-in leader election primitives
- Watch granularity less precise than etcd
Why not etcd alone?¶
- Key-value model awkward for complex documents
- No rich query capabilities
- Storage limits (default 2GB)
Consequences¶
Positive¶
- Best tool for each job
- Proven patterns from Kubernetes ecosystem
- Scheduler/Controller get reliable state watches
- LabletDefinitions stored in natural document format
Negative¶
- Operational complexity of two data stores
- Data synchronization between stores (if needed)
- Learning curve for etcd operations
Risks¶
- Consistency between etcd and MongoDB if same data in both
- etcd cluster management overhead
Data Distribution¶
etcd Keys¶
/lcm/instances/{id}/state # LabletInstance current state
/lcm/instances/{id}/worker # Assigned worker ID
/lcm/workers/{id}/state # Worker state (running, draining, stopped)
/lcm/workers/{id}/capacity # Current available capacity
/lcm/workers/{id}/ports # Port allocation bitmap
/lcm/scheduler/leader # Leader election key
/lcm/controller/leader # Leader election key
MongoDB Collections¶
lablet_definitions # Full LabletDefinition documents
worker_templates # WorkerTemplate documents
audit_events # CloudEvents for audit trail
Implementation Notes¶
Watch Pattern for Scheduler¶
async def watch_pending_instances():
"""Watch for new pending instances."""
async for event in etcd.watch_prefix("/lcm/instances/"):
if event.type == "PUT" and event.value["state"] == "PENDING":
await schedule_instance(event.key.split("/")[3])
State Update Flow¶
1. API receives mutation request
2. API validates and writes to etcd (state)
3. API writes to MongoDB (if document update)
4. etcd notifies watchers (Scheduler, Controller)
5. Scheduler/Controller process state change
6. Scheduler/Controller call API for mutations
Alternatives Considered¶
Redis + MongoDB¶
- Redis pub/sub less reliable than etcd watch
- No strong consistency guarantees
- Would work but etcd more robust
Single MongoDB with Change Streams¶
- Simpler operationally
- Change stream resumption complexity
- No built-in leader election
- Could reconsider if etcd overhead too high
Resolved Questions¶
-
~~Should Redis session store migrate to etcd for UI sessions?~~ → No, keep Redis for UI sessions (simpler TTL management, separation of concerns)
-
~~What is the etcd cluster sizing for expected load?~~ → TBD during implementation phase based on expected instance count
-
~~Should we prototype with MongoDB-only first and add etcd if needed?~~ → No, proceed with dual store architecture as designed