Skip to content

Architecture Decision Records (ADRs)ΒΆ

This directory contains Architecture Decision Records (ADRs) for the Lablet Cloud Manager's Lablet Resource Manager expansion.

ADR IndexΒΆ

Reading the supersession column. β†’ ADR-NNN means superseded (fully or partially) by that ADR; βŠ‡ ADR-NNN means supersedes it. The current-model cluster (ADR-036, 044–054) is still Proposed β€” it is the north-star the Solution Design docs describe, pending ratification. See the supersession chain below.

ADR Title Status Date Supersession
ADR-001 API-Centric State Management Accepted 2026-01-15 β€”
ADR-002 Separate Resource Scheduler Service Accepted 2026-01-15 role re-scoped by β†’ 054
ADR-003 CloudEvents for External Integration Accepted 2026-01-15 β€”
ADR-004 Port Allocation per Worker Accepted 2026-01-15 β€”
ADR-005 Dual State Store Architecture (etcd + MongoDB) Accepted 2026-01-16 β€”
ADR-006 Resource Scheduler High Availability Coordination Accepted 2026-01-16 β€”
ADR-007 Worker Template Seeding and Management Accepted 2026-01-15 β€”
ADR-008 Worker Draining State for Scale-Down Accepted 2026-01-16 β€”
ADR-009 Shared Core Package Architecture Accepted 2026-01-16 β€”
ADR-010 Service Unification on Neuroglia Framework Accepted 2026-01-17 β€”
ADR-011 APScheduler Removal and Controller Migration Accepted 2026-01-19 β€”
ADR-012 Dynamic Region Configuration Accepted 2026-01-19 β€”
ADR-013 SSE Protocol Improvements Accepted 2026-01-19 β€”
ADR-014 Worker Orphan Detection and Garbage Collection Accepted 2026-02-06 β€”
ADR-015 Control Plane API Must Not Call AWS EC2 Accepted 2026-02-06 β€”
ADR-016 License Operations via Worker-Controller Accepted (Partially Superseded) 2026-02-07 topology β†’ 054
ADR-017 Lab Operations via Lablet-Controller Accepted (Partially Superseded) 2026-02-07 topology β†’ 054
ADR-018 Lab Delivery System (LDS) Integration Accepted 2025-02-10 β€”
ADR-019 LabRecord as Independent AggregateRoot Accepted (Partially Superseded) 2026-02-10 binding β†’ 020
ADR-020 Session Entity Model Redesign Accepted (Partially Superseded) 2026-02-18 βŠ‡ 019 Β§binding Β· state machine β†’ 045
ADR-021 Child Entity Architecture for Session Tracking Accepted (Partially Superseded) 2026-02-18 part model β†’ 045
ADR-022 CloudEvent Ingestion via Lablet-Controller Accepted 2026-02-18 β€”
ADR-023 Content Sync Trigger via Reactive etcd Watch Accepted 2026-02-25 β€”
ADR-024 Content Package Storage in RustFS Accepted 2026-02-25 β€”
ADR-025 Content Metadata Storage in MongoDB Accepted 2026-02-25 β€”
ADR-026 Extensible Upstream Notifier Pattern (Deferred) Accepted 2026-02-25 β€”
ADR-027 Version Auto-Increment on Content Change Accepted 2026-02-25 β€”
ADR-028 LabletDefinition Initial Status (PENDING_SYNC) Accepted 2026-02-25 generalised by β†’ 059
ADR-029 Port Template Extraction from CML YAML Accepted 2026-02-25 β€”
ADR-030 Resource & Port Observation β€” "Learn from Live" Accepted 2026-02-28 β€”
ADR-031 Checkpoint-Based Instantiation Pipeline Accepted 2026-03-02 β€”
ADR-032 Port Allocation as LabRecord Topology Concern Accepted 2026-03-02 β€”
ADR-033 CML Node Tag Sync with Allocated Ports Accepted 2026-03-02 β€”
ADR-034 Pipeline Executor & Lifecycle Phase Handlers Proposed 2026-03-02 β€”
ADR-035 Legacy SchedulerService Removal Accepted 2026-03-04 role re-scoped by β†’ 054
ADR-036 Resource Management Abstraction Layer Accepted 2026-03-10 extended by 050
ADR-037 Timeslot Management Accepted 2026-03-10 β€”
ADR-038 Step Handler Registry & Reconciler Decomposition Accepted 2026-03-18 extended by 047
ADR-039 SSE Race Condition Fix Accepted 2026-04-10 β€”
ADR-040 LDS CloudEvent Direct Ingestion via CPA Accepted 2026-04-10 β€”
ADR-041 WebSocket-Based CML Worker Monitoring Proposed 2026-05-20 β€”
ADR-042 CommandHandlerBase Dependency Simplification Proposed 2026-06-01 β€”
ADR-043 Startup State Reconciliation and Discovery Separation Accepted 2026-06-04 β€”
ADR-044 ScenarioEngine β€” Pod Automation as a Separate Service Proposed (Rev 2) 2026-06-05 βŠ‡ Rev 1 (in-process design)
ADR-045 Multi-part Session / Part Model with Selector-Resolved Content Proposed 2026-06-12 βŠ‡ 020, 021 (session state machine β†’ part level)
ADR-046 Host Abstraction and PodType / HostType Split Proposed 2026-06-12 extends 036
ADR-047 Generic Reconciliation Framework with Per-Type Managers Proposed 2026-06-12 extends 036, 038
ADR-048 Unified Resource Dashboard and Shared lcm-core UI Components Proposed 2026-06-12 β€”
ADR-049 Unified Workflow DSL for Lifecycle / Step / Task Definitions Proposed 2026-06-12 inline tasks body & validation β†’ 057; data-flow β†’ 058
ADR-050 Definition/Instance Duality and Two-Tier Instance Layering Proposed 2026-06-12 extends 036; Form row partially β†’ 059
ADR-051 Provisioning Sources and Asymmetric Definition Lifecycle Proposed 2026-06-12 extends 050
ADR-052 Content-Authoring Taxonomy Import and Form Delivery Proposed 2026-06-12 extends 050, 051; Form-delivery stance β†’ 059
ADR-053 Authorization Policy Model Port Proposed 2026-06-12 extends 050
ADR-054 Controller Topology by Resource Kind Proposed (Rev 2) 2026-06-12 βŠ‡ topology of 016, 017; re-scopes 002, 035; Rev 2 adds form-/host-controller (059)
ADR-055 Per-Resource-Kind Lifecycle State Machines Proposed 2026-06-13 extends 047, 050
ADR-056 ADR Lifecycle & Supersession Conventions Proposed 2026-06-13 β€”
ADR-057 Content-Driven Lifecycle DSL β€” Primitives, Phases & scenarioFunctions Proposed 2026-06-13 βŠ‡ inline tasks body of 049 Β§2.1 & task-type list of 044 Β§2.8; extends 049, 044
ADR-058 Lifecycle Data-Flow & Variable Scopes Proposed 2026-06-13 extends 057
ADR-059 Form as First-Class Synced Resource Proposed 2026-06-16 βŠ‡ Form-delivery of 052 & Form row of 050; generalises 028; extends 051; related 046, 054

Title note: ADR-044's filename is ADR-044-content-driven-lifecycle-engine.md (and the mkdocs nav label still reads "Content-Driven Lifecycle Engine") but its current H1 is "ScenarioEngine β€” Pod Automation as a Separate Service" (Rev 2). The table above uses the current H1; the filename is retained to avoid breaking links.

Supersession chainΒΆ

flowchart LR
    A019[ADR-019 LabRecord] -->|binding| A020[ADR-020 Session Model]
    A020 -->|state machine| A045[ADR-045 Multi-part]
    A021[ADR-021 Child Entities] -->|part model| A045
    A016[ADR-016 License ops] -->|topology| A054[ADR-054 Controller Topology]
    A017[ADR-017 Lab ops] -->|topology| A054
    A002[ADR-002 Scheduler] -.->|role re-scoped| A054
    A035[ADR-035 Scheduler removal] -.->|role re-scoped| A054
    A044R1[ADR-044 Rev 1 in-process] -->|βŠ‡| A044[ADR-044 Rev 2 SE service]
    A049[ADR-049 Unified DSL] -->|tasks body & validation| A057[ADR-057 Lifecycle DSL]
    A044 -->|task-type list| A057
    A057 -->|extends| A058[ADR-058 Data-flow scopes]
    A052[ADR-052 Content taxonomy] -->|Form delivery| A059[ADR-059 Form as Resource]
    A050[ADR-050 Def/Instance duality] -.->|Form row| A059
    A028[ADR-028 Definition status] -.->|generalised| A059
    A059 -.->|form-/host-controller| A054

    classDef superseded fill:#fde68a,stroke:#b45309;
    classDef current fill:#a7f3d0,stroke:#047857;
    class A019,A020,A021,A016,A017,A002,A035,A044R1,A049,A052,A050,A028 superseded;
    class A045,A054,A044,A057,A058,A059 current;

Status DefinitionsΒΆ

Status Meaning
Proposed Under discussion, not yet approved
Accepted Decision made and should be followed
Superseded Replaced by another ADR
Deprecated No longer relevant

ADR TemplateΒΆ

When creating new ADRs, use this template:

# ADR-NNN: Title

| Attribute | Value |
|-----------|-------|
| **Status** | Proposed |
| **Date** | YYYY-MM-DD |
| **Deciders** | Team/Person |
| **Related ADRs** | Links to related ADRs |

## Context

What is the issue that we're seeing that is motivating this decision or change?

## Decision

What is the change that we're proposing and/or doing?

## Rationale

Why is this decision being made? What alternatives were considered?

## Consequences

### Positive
- What becomes easier or possible?

### Negative
- What becomes harder or impossible?

### Risks
- What could go wrong?

## Implementation Notes

Technical details, code examples, configuration.

Dependency GraphΒΆ

ADR-001 (API-Centric)
    β”œβ”€β”€ ADR-002 (Scheduler) ─────┐
    β”‚       └── ADR-006 (HA) ◄────
    β”‚                            β”‚
    β”œβ”€β”€ ADR-005 (State Store) β—„β”€β”€β”˜
    β”‚       └── ADR-006 (HA)
    β”‚
    └── ADR-013 (SSE Improvements)
            └── no controller-direct Redis

ADR-003 (CloudEvents)
    └── ADR-004 (Ports)

ADR-007 (Templates) ← standalone

ADR-008 (Draining)
    └── ADR-002 (Scheduler)

ADR-009 (Shared Core)
    └── ADR-010 (Neuroglia Unification)

ADR-010 (Neuroglia Unification)
    β”œβ”€β”€ ADR-011 (APScheduler Removal)
    └── ADR-012 (Dynamic Region Config)

ADR-011 (APScheduler Removal)
    └── controller-based execution replaces jobs

ADR-012 (Dynamic Region Config)
    └── SystemSettings + WorkerReconciler._run_discovery_loop()

ADR-013 (SSE Improvements)
    └── batching, filtering, extended events

ADR-018 (LDS Integration)
    β”œβ”€β”€ ADR-017 (Lab Operations via Lablet-Controller)
    β”œβ”€β”€ ADR-020 (Session Entity Model) ← amends terminology
    └── ADR-022 (CloudEvent Ingestion) ← amends routing

ADR-019 (LabRecord)
    └── ADR-020 (Session Entity Model) ← supersedes binding model

ADR-020 (Session Entity Model)
    β”œβ”€β”€ ADR-018 (LDS Integration)
    β”œβ”€β”€ ADR-019 (LabRecord) ← partially supersedes
    └── ADR-021 (Child Entities)

ADR-021 (Child Entity Architecture)
    β”œβ”€β”€ ADR-020 (Session Entity Model)
    └── ADR-022 (CloudEvent Ingestion)

ADR-022 (CloudEvent Ingestion)
    β”œβ”€β”€ ADR-003 (CloudEvents)
    β”œβ”€β”€ ADR-015 (Control Plane API No External Calls)
    └── ADR-018 (LDS Integration) ← amends Β§7

# Content Synchronization cluster (ADR-023–028)
ADR-023 (Content Sync Trigger)
    β”œβ”€β”€ ADR-005 (Dual State Store) ← extends etcd key namespace
    β”œβ”€β”€ ADR-015 (CPA No External Calls)
    β”œβ”€β”€ ADR-017 (Lab Operations) ← extends reconciliation pattern
    β”œβ”€β”€ ADR-024 (Package Storage)
    β”œβ”€β”€ ADR-025 (Content Metadata)
    └── ADR-026 (Upstream Notifier)

ADR-024 (Package Storage in RustFS)
    └── ADR-025 (Content Metadata) ← complementary

ADR-025 (Content Metadata in MongoDB)
    β”œβ”€β”€ ADR-005 (Dual State Store)
    └── ADR-024 (Package Storage) ← complementary

ADR-026 (Upstream Notifier Pattern)
    └── ADR-018 (LDS Integration)

ADR-027 (Version Auto-Increment)
    β”œβ”€β”€ ADR-023 (Content Sync Trigger)
    └── ADR-028 (Definition Initial Status)

ADR-028 (Definition Initial Status)
    β”œβ”€β”€ ADR-023 (Content Sync Trigger)
    └── ADR-027 (Version Auto-Increment)

ADR-029 (Port Template Extraction)
    β”œβ”€β”€ ADR-025 (Content Metadata Storage)
    └── ADR-028 (Definition Initial Status)

ADR-030 (Resource & Port Observation β€” Learn from Live)
    β”œβ”€β”€ ADR-004 (Port Allocation per Worker)
    β”œβ”€β”€ ADR-017 (Lab Operations via Lablet-Controller)
    β”œβ”€β”€ ADR-020 (Session Entity Model)
    └── ADR-029 (Port Template Extraction)

# Instantiation Pipeline cluster (ADR-031–033)
ADR-031 (Checkpoint Pipeline)
    β”œβ”€β”€ ADR-004 (Port Allocation per Worker)
    β”œβ”€β”€ ADR-017 (Lab Operations via Lablet-Controller)
    β”œβ”€β”€ ADR-020 (Session Entity Model)
    β”œβ”€β”€ ADR-029 (Port Template Extraction)
    β”œβ”€β”€ ADR-030 (Resource Observation)
    β”œβ”€β”€ ADR-032 (Port Allocation on LabRecord)
    └── ADR-033 (CML Node Tag Sync)

ADR-032 (Port Allocation as LabRecord Topology)
    β”œβ”€β”€ ADR-004 (Port Allocation per Worker)
    β”œβ”€β”€ ADR-019 (LabRecord as AggregateRoot)
    β”œβ”€β”€ ADR-020 (Session Entity Model)
    β”œβ”€β”€ ADR-029 (Port Template Extraction)
    β”œβ”€β”€ ADR-031 (Checkpoint Pipeline)
    └── ADR-033 (CML Node Tag Sync)

ADR-033 (CML Node Tag Sync)
    β”œβ”€β”€ ADR-004 (Port Allocation per Worker)
    β”œβ”€β”€ ADR-017 (Lab Operations via Lablet-Controller)
    β”œβ”€β”€ ADR-029 (Port Template Extraction)
    β”œβ”€β”€ ADR-031 (Checkpoint Pipeline)
    └── ADR-032 (Port Allocation on LabRecord)

ADR-039 (SSE Race Condition Fix)
    β”œβ”€β”€ ADR-013 (SSE Protocol Improvements)
    └── ADR-001 (API-Centric State Management)

ADR-040 (LDS CloudEvent Direct Ingestion via CPA)
    β”œβ”€β”€ ADR-003 (CloudEvents)
    β”œβ”€β”€ ADR-015 (CPA No External Calls)
    β”œβ”€β”€ ADR-018 (LDS Integration)
    └── ADR-022 (CloudEvent Ingestion) ← amends (dual routing)

# Worker / runtime hardening (ADR-041–043)
ADR-041 (WebSocket-Based CML Worker Monitoring)
    └── ADR-013 (SSE Protocol Improvements)

ADR-042 (CommandHandlerBase Dependency Simplification)
    └── ADR-010 (Service Unification on Neuroglia)

ADR-043 (Startup State Reconciliation and Discovery Separation)
    β”œβ”€β”€ ADR-012 (Dynamic Region Configuration)
    └── ADR-014 (Worker Orphan Detection)

# Generalized resource-plane cluster (current model β€” ADR-036 + 044–054, all Proposed)
ADR-036 (Resource Management Abstraction Layer) ← layered state base
    β”œβ”€β”€ ADR-037 (Timeslot Management)
    β”œβ”€β”€ ADR-046 (Host / PodType–HostType Split) ← extends
    β”œβ”€β”€ ADR-047 (Generic Reconciliation Framework) ← extends (with ADR-038)
    └── ADR-050 (Definition/Instance Duality) ← extends

ADR-044 (ScenarioEngine β€” Pod Automation as a Separate Service, Rev 2)
    β”œβ”€β”€ supersedes Rev 1 (in-process design)
    └── ADR-049 (Unified Workflow DSL) ← job/step description

ADR-045 (Multi-part Session / Part Model)
    β”œβ”€β”€ ADR-036 (Resource Abstraction)
    β”œβ”€β”€ ADR-020 (Session Entity Model) ← supersedes (state machine β†’ part level)
    β”œβ”€β”€ ADR-021 (Child Entity Architecture) ← supersedes (part model)
    β”œβ”€β”€ ADR-046 (Host / Type split)
    └── ADR-047 (Generic Reconciliation)

ADR-047 (Generic Reconciliation Framework)
    β”œβ”€β”€ ADR-036 (Resource Abstraction)
    β”œβ”€β”€ ADR-038 (Step Handler Registry) ← extends
    └── ADR-054 (Controller Topology) ← maps managers β†’ services

ADR-050 (Definition/Instance Duality)
    β”œβ”€β”€ ADR-036 (Resource Abstraction) ← extends
    β”œβ”€β”€ ADR-051 (Provisioning Sources) ← extends
    β”œβ”€β”€ ADR-052 (Content-Authoring Taxonomy) ← extends
    └── ADR-053 (Authorization Policy Model) ← extends

ADR-051 (Provisioning Sources)
    β”œβ”€β”€ ADR-050 (Definition/Instance Duality)
    └── ADR-023–028 (Content Sync cluster) ← reconciles content_package source

ADR-052 (Content-Authoring Taxonomy)
    β”œβ”€β”€ ADR-050 / ADR-051
    β”œβ”€β”€ ADR-044 (Content-driven lifecycle / SE)
    └── ADR-045 (Multi-part β€” supplies the Forms parts select)

ADR-053 (Authorization Policy Model)
    β”œβ”€β”€ ADR-050 (Definition/Instance Duality)
    └── ADR-001 (API-Centric State Management)

ADR-054 (Controller Topology by Resource Kind)
    β”œβ”€β”€ ADR-047 (Per-type managers) ← extends
    β”œβ”€β”€ ADR-016 (License ops) ← supersedes topology
    β”œβ”€β”€ ADR-017 (Lab ops) ← supersedes topology
    β”œβ”€β”€ ADR-002 (Resource Scheduler) ← re-scopes role
    β”œβ”€β”€ ADR-035 (Legacy Scheduler Removal) ← re-scopes role
    └── ADR-046 (Host adapters live inside pod-controller)