Worker Controller Guide¶
Documentation In Progress
This service guide is a placeholder. Full documentation is being developed.
Overview¶
The Worker Controller is a Kubernetes-style controller that observes CML Worker instances and reconciles their state. It is responsible for metrics collection, system health monitoring, and license verification.
Architecture¶
See Worker Controller Architecture for detailed design.
Core Responsibilities¶
| Responsibility | Description |
|---|---|
| Worker Observation | Monitor worker health and status |
| Metrics Collection | Gather CPU, memory, storage utilization |
| License Verification | Check CML license status |
| Idle Detection | Identify workers with no active labs |
| HA Coordination | Leader election for single active controller |
CML API Scope
Worker Controller uses CML System API ONLY (system_information, system_stats). Lab-level operations are handled by Lablet Controller.
Key Flows¶
Worker Reconciliation Loop¶
stateDiagram-v2
[*] --> Observe
Observe --> Analyze: Worker list from etcd
Analyze --> Act: Delta detected
Act --> Observe: Reconciliation complete
Analyze --> Observe: No changes
state Act {
[*] --> CollectMetrics
CollectMetrics --> CheckLicense
CheckLicense --> UpdateStatus
UpdateStatus --> PublishEvent
PublishEvent --> [*]
}
API Endpoints¶
Internal Service
The Worker Controller primarily operates via etcd watches. Limited REST API for health and status.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/ready |
Readiness check |
GET |
/metrics |
Prometheus metrics |
Configuration¶
Key environment variables:
| Variable | Description | Default |
|---|---|---|
CONTROLLER_ENABLED |
Enable controller | true |
METRICS_POLL_INTERVAL |
Metrics collection interval (seconds) | 300 |
ETCD_ENDPOINTS |
etcd cluster endpoints | http://etcd:2379 |
CML_API_TIMEOUT |
CML API timeout (seconds) | 30 |
CML System API Integration¶
The Worker Controller uses these CML endpoints:
| Endpoint | Purpose | Auth Required |
|---|---|---|
/api/v0/system_information |
System info, version | No |
/api/v0/system_stats |
Resource utilization | Yes |
/api/v0/licensing |
License status | Yes |
API Boundary
Do NOT call /api/v0/labs/* from Worker Controller.
Use Lablet Controller for lab operations.