Architecture Decision Records (ADRs)¶
This directory contains Architecture Decision Records (ADRs) for the Lablet Cloud Manager's Lablet Resource Manager expansion.
ADR Index¶
| ADR | Title | Status | Date |
|---|---|---|---|
| ADR-001 | API-Centric State Management | Accepted | 2026-01-15 |
| ADR-002 | Separate Resource Scheduler Service | Accepted | 2026-01-15 |
| ADR-003 | CloudEvents for External Integration | Accepted | 2026-01-15 |
| ADR-004 | Port Allocation per Worker | Accepted | 2026-01-15 |
| ADR-005 | Dual State Store Architecture (etcd + MongoDB) | Accepted | 2026-01-16 |
| ADR-006 | Resource Scheduler High Availability Coordination | Accepted | 2026-01-16 |
| ADR-007 | Worker Template Seeding and Management | Accepted | 2026-01-15 |
| ADR-008 | Worker Draining State for Scale-Down | Accepted | 2026-01-16 |
| ADR-009 | Shared Core Package Architecture | Accepted | 2026-01-16 |
| ADR-010 | Service Unification on Neuroglia Framework | Accepted | 2026-01-17 |
| ADR-011 | APScheduler Removal and Controller Migration | Accepted | 2026-01-19 |
| ADR-012 | Dynamic Region Configuration | Accepted | 2026-01-19 |
| ADR-013 | SSE Protocol Improvements | Accepted | 2026-01-19 |
| ADR-014 | Worker Orphan Detection and Garbage Collection | Accepted | 2026-02-06 |
| ADR-015 | Control Plane API Must Not Call AWS EC2 | Accepted | 2026-02-06 |
| ADR-016 | License Operations via Worker-Controller | Accepted | 2026-02-06 |
| ADR-017 | Lab Operations via Lablet-Controller | Accepted | 2026-02-06 |
| ADR-018 | Lab Delivery System (LDS) Integration | Accepted | 2025-02-10 |
| ADR-019 | LabRecord as Independent AggregateRoot | Accepted (Partially Superseded) | 2026-02-10 |
| ADR-020 | Session Entity Model Redesign | Accepted | 2026-02-18 |
| ADR-021 | Child Entity Architecture for Session Tracking | Accepted | 2026-02-18 |
| ADR-022 | CloudEvent Ingestion via Lablet-Controller | Accepted | 2026-02-18 |
| ADR-023 | Content Sync Trigger via Reactive etcd Watch | Accepted | 2026-02-25 |
| ADR-024 | Content Package Storage in RustFS | Accepted | 2026-02-25 |
| ADR-025 | Content Metadata Storage in MongoDB | Accepted | 2026-02-25 |
| ADR-026 | Extensible Upstream Notifier Pattern (Deferred) | Accepted | 2026-02-25 |
| ADR-027 | Version Auto-Increment on Content Change | Accepted | 2026-02-25 |
| ADR-028 | LabletDefinition Initial Status (PENDING_SYNC) | Accepted | 2026-02-25 |
| ADR-029 | Port Template Extraction from CML YAML | Accepted | 2026-02-25 |
| ADR-030 | Resource & Port Observation โ "Learn from Live" | Accepted | 2026-02-28 |
| ADR-031 | Checkpoint-Based Instantiation Pipeline | Accepted | 2026-03-02 |
| ADR-032 | Port Allocation as LabRecord Topology Concern | Accepted | 2026-03-02 |
| ADR-033 | CML Node Tag Sync with Allocated Ports | Accepted | 2026-03-02 |
| ADR-034 | Pipeline Executor & Lifecycle Phase Handlers | Proposed | 2026-03-02 |
| ADR-035 | Legacy SchedulerService Removal | Accepted | 2026-03-04 |
| ADR-036 | Resource Management Abstraction Layer | Accepted | 2026-03-10 |
| ADR-037 | Timeslot Management | Accepted | 2026-03-10 |
| ADR-038 | Step Handler Registry & Reconciler Decomposition | Accepted | 2026-03-18 |
| ADR-039 | SSE Race Condition Fix | Accepted | 2026-04-10 |
| ADR-040 | LDS CloudEvent Direct Ingestion via CPA | Accepted | 2026-04-10 |
Status Definitions¶
| Status | Meaning |
|---|---|
| Proposed | Under discussion, not yet approved |
| Accepted | Decision made and should be followed |
| Superseded | Replaced by another ADR |
| Deprecated | No longer relevant |
ADR Template¶
When creating new ADRs, use this template:
# ADR-NNN: Title
| Attribute | Value |
|-----------|-------|
| **Status** | Proposed |
| **Date** | YYYY-MM-DD |
| **Deciders** | Team/Person |
| **Related ADRs** | Links to related ADRs |
## Context
What is the issue that we're seeing that is motivating this decision or change?
## Decision
What is the change that we're proposing and/or doing?
## Rationale
Why is this decision being made? What alternatives were considered?
## Consequences
### Positive
- What becomes easier or possible?
### Negative
- What becomes harder or impossible?
### Risks
- What could go wrong?
## Implementation Notes
Technical details, code examples, configuration.
Dependency Graph¶
ADR-001 (API-Centric)
โโโ ADR-002 (Scheduler) โโโโโโ
โ โโโ ADR-006 (HA) โโโโโค
โ โ
โโโ ADR-005 (State Store) โโโโ
โ โโโ ADR-006 (HA)
โ
โโโ ADR-013 (SSE Improvements)
โโโ no controller-direct Redis
ADR-003 (CloudEvents)
โโโ ADR-004 (Ports)
ADR-007 (Templates) โ standalone
ADR-008 (Draining)
โโโ ADR-002 (Scheduler)
ADR-009 (Shared Core)
โโโ ADR-010 (Neuroglia Unification)
ADR-010 (Neuroglia Unification)
โโโ ADR-011 (APScheduler Removal)
โโโ ADR-012 (Dynamic Region Config)
ADR-011 (APScheduler Removal)
โโโ controller-based execution replaces jobs
ADR-012 (Dynamic Region Config)
โโโ SystemSettings + WorkerReconciler._run_discovery_loop()
ADR-013 (SSE Improvements)
โโโ batching, filtering, extended events
ADR-018 (LDS Integration)
โโโ ADR-017 (Lab Operations via Lablet-Controller)
โโโ ADR-020 (Session Entity Model) โ amends terminology
โโโ ADR-022 (CloudEvent Ingestion) โ amends routing
ADR-019 (LabRecord)
โโโ ADR-020 (Session Entity Model) โ supersedes binding model
ADR-020 (Session Entity Model)
โโโ ADR-018 (LDS Integration)
โโโ ADR-019 (LabRecord) โ partially supersedes
โโโ ADR-021 (Child Entities)
ADR-021 (Child Entity Architecture)
โโโ ADR-020 (Session Entity Model)
โโโ ADR-022 (CloudEvent Ingestion)
ADR-022 (CloudEvent Ingestion)
โโโ ADR-003 (CloudEvents)
โโโ ADR-015 (Control Plane API No External Calls)
โโโ ADR-018 (LDS Integration) โ amends ยง7
# Content Synchronization cluster (ADR-023โ028)
ADR-023 (Content Sync Trigger)
โโโ ADR-005 (Dual State Store) โ extends etcd key namespace
โโโ ADR-015 (CPA No External Calls)
โโโ ADR-017 (Lab Operations) โ extends reconciliation pattern
โโโ ADR-024 (Package Storage)
โโโ ADR-025 (Content Metadata)
โโโ ADR-026 (Upstream Notifier)
ADR-024 (Package Storage in RustFS)
โโโ ADR-025 (Content Metadata) โ complementary
ADR-025 (Content Metadata in MongoDB)
โโโ ADR-005 (Dual State Store)
โโโ ADR-024 (Package Storage) โ complementary
ADR-026 (Upstream Notifier Pattern)
โโโ ADR-018 (LDS Integration)
ADR-027 (Version Auto-Increment)
โโโ ADR-023 (Content Sync Trigger)
โโโ ADR-028 (Definition Initial Status)
ADR-028 (Definition Initial Status)
โโโ ADR-023 (Content Sync Trigger)
โโโ ADR-027 (Version Auto-Increment)
ADR-029 (Port Template Extraction)
โโโ ADR-025 (Content Metadata Storage)
โโโ ADR-028 (Definition Initial Status)
ADR-030 (Resource & Port Observation โ Learn from Live)
โโโ ADR-004 (Port Allocation per Worker)
โโโ ADR-017 (Lab Operations via Lablet-Controller)
โโโ ADR-020 (Session Entity Model)
โโโ ADR-029 (Port Template Extraction)
# Instantiation Pipeline cluster (ADR-031โ033)
ADR-031 (Checkpoint Pipeline)
โโโ ADR-004 (Port Allocation per Worker)
โโโ ADR-017 (Lab Operations via Lablet-Controller)
โโโ ADR-020 (Session Entity Model)
โโโ ADR-029 (Port Template Extraction)
โโโ ADR-030 (Resource Observation)
โโโ ADR-032 (Port Allocation on LabRecord)
โโโ ADR-033 (CML Node Tag Sync)
ADR-032 (Port Allocation as LabRecord Topology)
โโโ ADR-004 (Port Allocation per Worker)
โโโ ADR-019 (LabRecord as AggregateRoot)
โโโ ADR-020 (Session Entity Model)
โโโ ADR-029 (Port Template Extraction)
โโโ ADR-031 (Checkpoint Pipeline)
โโโ ADR-033 (CML Node Tag Sync)
ADR-033 (CML Node Tag Sync)
โโโ ADR-004 (Port Allocation per Worker)
โโโ ADR-017 (Lab Operations via Lablet-Controller)
โโโ ADR-029 (Port Template Extraction)
โโโ ADR-031 (Checkpoint Pipeline)
โโโ ADR-032 (Port Allocation on LabRecord)
ADR-039 (SSE Race Condition Fix)
โโโ ADR-013 (SSE Protocol Improvements)
โโโ ADR-001 (API-Centric State Management)
ADR-040 (LDS CloudEvent Direct Ingestion via CPA)
โโโ ADR-003 (CloudEvents)
โโโ ADR-015 (CPA No External Calls)
โโโ ADR-018 (LDS Integration)
โโโ ADR-022 (CloudEvent Ingestion) โ amends (dual routing)