Lablet Controller Guide¶
Documentation In Progress
This service guide is a placeholder. Full documentation is being developed.
Overview¶
The Lablet Controller is a Kubernetes-style controller that manages the lifecycle of LabletInstances on CML Workers. It handles lab import, startup, collection, and teardown operations.
Key Concept: A LabletInstance is a composite resource consisting of:
- CML Lab: Infrastructure running on a CML worker (managed via CML API)
- LabSession: User-facing session in LDS (managed via LabDeliverySPI)
Both components must be provisioned for a LabletInstance to be considered fully operational.
Architecture¶
See Lablet Controller Architecture for detailed design.
Core Responsibilities¶
| Responsibility | Description |
|---|---|
| Lab Lifecycle | Import, start, stop, wipe, delete labs on workers |
| LDS Integration | Provision LabSession in Lab Delivery System |
| Port Allocation | Assign unique ports for console access |
| State Reconciliation | Align actual lab state with desired state |
| Collection Trigger | Initiate lab artifact collection |
| HA Coordination | Leader election for single active controller |
API Boundaries
- CML Labs API: Labs, nodes, links, interfaces (✅ lablet-controller)
- CML System API: System info, stats, licensing (❌ worker-controller only)
- LDS API: Session provisioning, device access (✅ lablet-controller)
Key Flows¶
Lab Provisioning Flow¶
sequenceDiagram
participant RS as Resource Scheduler
participant etcd as etcd (State Store)
participant LC as Lablet Controller
participant CML as CML Worker
RS->>etcd: Write: Instance assignment
LC->>etcd: Watch: Instance assignments
etcd-->>LC: Notify: New assignment
LC->>CML: POST /api/v0/labs: Import lab
CML-->>LC: 201: lab_id
LC->>CML: PUT /api/v0/labs/{id}/state: START
CML-->>LC: 200: Started
LC->>etcd: Write: Instance status=READY
READY State
The lablet-controller transitions to READY (not RUNNING) after provisioning.
RUNNING is triggered by LDS CloudEvent when the user logs in.
LDS Session Provisioning Flow¶
sequenceDiagram
participant LC as Lablet Controller
participant CML as CML Worker
participant LDS as Lab Delivery System
participant S3 as S3/MinIO
Note over LC: LabletInstance in INSTANTIATING
LC->>CML: Start lab (CML API)
CML-->>LC: Lab running
LC->>S3: Fetch content.xml
S3-->>LC: Device definitions
LC->>LDS: create_session_with_part()
LDS-->>LC: session_id, login_url
LC->>LDS: set_devices(session_id, devices)
LDS-->>LC: OK
Note over LC: Transition to READY (awaiting user login)
User Login Flow (CloudEvent)¶
sequenceDiagram
participant User
participant LDS as Lab Delivery System
participant CPA as Control-Plane-API
participant DB as MongoDB
User->>LDS: Login via login_url
LDS->>LDS: Validate session token
LDS->>CPA: POST /api/cloudevents<br/>type: session.started
CPA->>DB: Lookup by lds_session_id
CPA->>CPA: Validate state == READY
CPA->>DB: Update: READY → RUNNING
CPA-->>LDS: 202 Accepted
Event-Driven Transition
The READY → RUNNING transition is event-driven via CloudEvents from LDS.
This is handled by control-plane-api, not lablet-controller.
Port Allocation¶
flowchart LR
A[Instance Scheduled] --> B{Ports Needed?}
B -->|Yes| C[Reserve from Worker Pool]
C --> D[Assign to Instance]
D --> E[Configure Console Access]
B -->|No| E
E --> F[Lab Started]
Port Range
Each worker has a dedicated port range (2000-9999). Lablet Controller manages allocation within this range per worker.
API Endpoints¶
Internal Service
The Lablet Controller primarily operates via etcd watches. Limited REST API for health and status.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/ready |
Readiness check |
GET |
/metrics |
Prometheus metrics |
Configuration¶
Key environment variables:
| Variable | Description | Default |
|---|---|---|
CONTROLLER_ENABLED |
Enable controller | true |
ETCD_ENDPOINTS |
etcd cluster endpoints | http://etcd:2379 |
CML_API_TIMEOUT |
CML API timeout (seconds) | 60 |
LAB_IMPORT_TIMEOUT |
Lab import timeout (seconds) | 300 |
LDS_API_URL |
Lab Delivery System API URL | http://localhost:8081 |
LDS_API_TIMEOUT |
LDS API timeout (seconds) | 30 |
S3_ENDPOINT |
S3/MinIO endpoint for content | http://localhost:9000 |
CML Labs API Integration¶
The Lablet Controller uses these CML endpoints:
| Endpoint | Purpose | Auth Required |
|---|---|---|
/api/v0/labs |
List/create labs | Yes |
/api/v0/labs/{id} |
Get/delete lab | Yes |
/api/v0/labs/{id}/state |
Start/stop lab | Yes |
/api/v0/labs/{id}/nodes |
Node management | Yes |
/api/v0/labs/{id}/nodes/{node}/console_key |
Console access | Yes |
/api/v0/labs/{id}/download |
Export lab YAML | Yes |
API Boundary
Do NOT call /api/v0/system_* from Lablet Controller.
LDS Integration¶
The Lablet Controller integrates with the Lab Delivery System (LDS) to provision user-facing sessions. See ADR-018 LDS Integration for architectural decision.
LabDeliverySPI¶
The LabDeliverySPI abstraction provides:
| Method | Purpose |
|---|---|
create_session_with_part() |
Create LDS session with content |
set_devices() |
Provision device access credentials |
get_session_info() |
Get session state and login URL |
get_login_url() |
Get user login URL |
archive_session() |
Archive session on termination |
refresh_content() |
Refresh content from S3 bucket |
Device Mapping¶
Device access info is derived from:
- content.xml - Device labels and definitions
- cml.yaml - Node topology with
device_labelannotations - Port Allocation - Assigned external ports
- DeviceAccessInfo - Final payload sent to LDS
@dataclass
class DeviceAccessInfo:
name: str # Device label
protocol: str # ssh, telnet, https
host: str # Worker hostname
port: int # Allocated port
uri: str | None # Optional URI
username: str # Device credentials
password: str # Device credentials
Content Refresh¶
When a LabletDefinition is versioned:
- Content updated in S3 bucket
- LCM API triggers refresh
- Lablet Controller calls
refresh_content(form_qualified_name) - LDS refreshes content from S3
See FR-2.1.6 for requirements.
Collect and Grade¶
When the user completes the lab, external systems trigger the CollectAndGradeCommand:
sequenceDiagram
participant Exam as Exam System
participant CPA as Control-Plane-API
participant LC as Lablet Controller
participant CML as CML Worker
participant GE as Grading Engine
Exam->>CPA: POST /api/instances/{id}/collect-and-grade
CPA->>CPA: Validate state == RUNNING
CPA->>CPA: Transition: RUNNING → COLLECTING
CPA-->>Exam: 202 Accepted
LC->>LC: Observe COLLECTING state
LC->>CML: Extract node configs
CML-->>LC: Config artifacts
LC->>LC: Transition: COLLECTING → GRADING
LC->>GE: Submit artifacts + rubric
GE-->>LC: grading_score
LC->>CPA: Store grading_score
LC->>LC: Transition: GRADING → STOPPING
See FR-2.2.7 for requirements.