Skip to content

Auto-Scaling Architecture

Version: 1.0.0 (February 2026) Scope: Cross-cutting — spans Resource Scheduler, Control Plane API, and Worker Controller Phase: Phase 3 - Auto-Scaling

Related Documentation


1. Overview

LCM auto-scaling automatically adjusts the CML worker fleet based on workload demand. It operates in two directions:

Direction Trigger Service Chain Goal
Scale Up Insufficient capacity for pending lablet instances Resource Scheduler → Control Plane API → Worker Controller Add workers
Scale Down Workers idle beyond threshold Worker Controller → Control Plane API Remove workers
flowchart LR
    subgraph up["Scale Up"]
        direction TB
        RS["Resource Scheduler<br/>PlacementEngine"]
        CP1["Control Plane API<br/>RequestScaleUpCommand"]
        WC1["Worker Controller<br/>_handle_pending"]
        RS -->|"No capacity"| CP1 -->|"Create PENDING worker"| WC1
    end

    subgraph down["Scale Down"]
        direction TB
        WC2["Worker Controller<br/>_evaluate_scale_down"]
        CP2["Control Plane API<br/>DrainWorkerCommand"]
        WC3["Worker Controller<br/>_handle_stopping"]
        WC2 -->|"Idle + eligible"| CP2 -->|"DRAINING"| WC3
    end

    style up fill:#E8F5E9,stroke:#2E7D32
    style down fill:#FFEBEE,stroke:#C62828

2. Scale-Up Flow

When the Resource Scheduler cannot find a suitable worker for a pending lablet instance, it triggers a scale-up:

sequenceDiagram
    autonumber
    participant Inst as Lablet Instance<br/>(PENDING_SCHEDULING)
    participant RS as Resource Scheduler
    participant PE as Placement Engine
    participant CP as Control Plane API
    participant etcd as etcd
    participant WC as Worker Controller
    participant EC2 as AWS EC2

    Inst->>RS: Observed via watch/polling

    RS->>PE: place_instance(instance, workers, templates)

    alt Workers available
        PE->>PE: Filter → Score → Select
        PE-->>RS: PlacementDecision(action=assign, worker_id)
        RS->>CP: assign_instance(instance_id, worker_id)
    else No workers or no capacity
        PE->>PE: _select_template(requirements)
        PE-->>RS: PlacementDecision(action=scale_up, template)
        RS->>CP: request_scale_up(template, reason)

        rect rgb(232, 245, 233)
            Note over CP: RequestScaleUpCommand
            CP->>CP: Resolve template
            CP->>CP: Check scaling constraints
            CP->>CP: Create CMLWorker(status=PENDING)
            CP->>etcd: PUT /workers/new-id
        end

        etcd-->>WC: Watch event

        rect rgb(227, 242, 253)
            Note over WC: _handle_pending
            WC->>EC2: launch_instance()
            EC2-->>WC: instance_id
            WC->>CP: update_status(PROVISIONING)
        end

        Note over WC: Subsequent cycles:<br/>PROVISIONING → RUNNING
    end

Scaling Constraints

Before creating a new worker, RequestScaleUpCommand validates:

Constraint Check Default Behavior on Violation
Max workers per region active_count < max_workers_per_region 10 Reject with 409 Conflict
Pending workers Check for existing PENDING workers Log warning (does not block)
Constraint check failure Exception during validation Fail open — allows scale-up

Fail-Open Design

If the constraint check itself fails (e.g., database error), the scale-up is allowed to proceed. This prevents a cascading failure where infrastructure issues also prevent scaling.


3. Placement Algorithm

The PlacementEngine implements a Filter → Score → Select pattern inspired by the Kubernetes scheduler:

flowchart TD
    Start["place_instance(instance)"] --> HasWorkers{"Any workers<br/>registered?"}

    HasWorkers -->|No| ScaleUp1["PlacementDecision<br/>action=scale_up<br/>reason: no workers available"]

    HasWorkers -->|Yes| Filter["Filter Phase<br/>(5 predicates)"]

    Filter --> HasCandidates{"Any candidates<br/>passed filter?"}

    HasCandidates -->|No| ScaleUp2["PlacementDecision<br/>action=scale_up<br/>reason: + rejection_reasons"]

    HasCandidates -->|Yes| Score["Score Phase<br/>(utilization + locality)"]

    Score --> Select["Select highest score<br/>(bin-packing: most utilized)"]

    Select --> Assign["PlacementDecision<br/>action=assign<br/>worker_id=selected"]

    style Start fill:#1565C0,color:white
    style ScaleUp1 fill:#FF9800,color:white
    style ScaleUp2 fill:#FF9800,color:white
    style Assign fill:#4CAF50,color:white

Filter Predicates (5 checks)

Each worker must pass all predicates to be eligible:

flowchart TD
    W["Worker candidate"] --> F1{"Status check<br/>status = RUNNING?"}
    F1 -->|Fail| R1["Rejected:<br/>status_not_eligible"]
    F1 -->|Pass| F2{"License affinity<br/>matches requirement?"}
    F2 -->|Fail| R2["Rejected:<br/>license_affinity"]
    F2 -->|Pass| F3{"Resource capacity<br/>CPU, memory, storage?"}
    F3 -->|Fail| R3["Rejected:<br/>insufficient_capacity"]
    F3 -->|Pass| F4{"AMI requirements<br/>CML version, node defs?"}
    F4 -->|Fail| R4["Rejected:<br/>ami"]
    F4 -->|Pass| F5{"Port availability<br/>enough free ports?"}
    F5 -->|Fail| R5["Rejected:<br/>port_availability"]
    F5 -->|Pass| OK["Passes filter"]

    style OK fill:#4CAF50,color:white
    style R1 fill:#f44336,color:white
    style R2 fill:#f44336,color:white
    style R3 fill:#f44336,color:white
    style R4 fill:#f44336,color:white
    style R5 fill:#f44336,color:white
# Predicate Data Source Logic
1 Status Worker state Must be RUNNING (not DRAINING, STOPPING, etc.)
2 License Instance requirements If instance specifies license types, worker's license must match
3 Capacity etcd (preferred) or worker state Available CPU ≥ required, memory ≥ required, storage ≥ required
4 AMI Instance requirements CML version in min/max range, required node definitions present
5 Ports Worker state + allocated ports max_ports - allocated_ports ≥ required_ports

Real-Time Capacity via etcd

The placement engine prefers etcd for capacity data (updated every 30s by workers) over MongoDB state, which may be stale. Capacity keys: /lcm/workers/{id}/capacity.

Scoring Formula

Workers that pass filtering are scored to select the most utilized (bin-packing strategy):

$$ \text{score} = \frac{\text{cpu_utilization} + \text{memory_utilization}}{2} + \text{locality_bonus} $$

Where:

  • $\text{cpu_utilization} = \frac{\text{allocated_cpu}}{\text{declared_cpu}}$
  • $\text{memory_utilization} = \frac{\text{allocated_memory}}{\text{declared_memory}}$
  • $\text{locality_bonus} = \min(0.05, \text{instance_count} \times 0.01)$ — small bonus for co-locating instances

Higher score = preferred — the algorithm packs workloads onto the most utilized workers first, keeping other workers available for larger workloads or scale-down.


4. Template Selection

When the placement engine decides to scale up, it selects the optimal worker template:

flowchart TD
    Start["_select_template(requirements)"] --> HasTemplates{"Templates<br/>available?"}

    HasTemplates -->|No| Hardcoded["Tier 3: Hardcoded fallback"]

    HasTemplates -->|Yes| Tier1["Tier 1: Capacity-based<br/>cheapest viable"]

    Tier1 --> T1Filter["Filter: enabled templates<br/>where CPU, memory, storage<br/>all >= requirements"]
    T1Filter --> T1Sort["Sort by cost_per_hour ASC"]
    T1Sort --> T1Check{"Any matches?"}

    T1Check -->|Yes| T1Select["Select cheapest viable"]
    T1Check -->|No| Tier2["Tier 2: Largest available"]

    Tier2 --> T2Filter["Filter: enabled templates"]
    T2Filter --> T2Sort["Sort by CPU DESC"]
    T2Sort --> T2Check{"Any templates?"}

    T2Check -->|Yes| T2Select["Select largest + warning"]
    T2Check -->|No| Hardcoded

    Hardcoded --> HC{"Required CPU?"}
    HC -->|">= 32"| Metal["metal"]
    HC -->|">= 16"| Large["large"]
    HC -->|">= 4"| Medium["medium"]
    HC -->|"< 4"| Small["small"]

    style Start fill:#1565C0,color:white
    style T1Select fill:#4CAF50,color:white
    style T2Select fill:#FF9800,color:white
    style Metal fill:#9E9E9E,color:white
    style Large fill:#9E9E9E,color:white
    style Medium fill:#9E9E9E,color:white
    style Small fill:#9E9E9E,color:white

See Worker Templates for template definitions and capacity model.


5. Scale-Down Flow

Scale-down is evaluated during the Worker Controller's _handle_running reconciliation (Step 6), after idle detection has run:

sequenceDiagram
    autonumber
    participant WC as Worker Controller
    participant CP as Control Plane API
    participant etcd as etcd

    Note over WC: _handle_running Step 6

    WC->>WC: _evaluate_scale_down(worker, idle_result)

    alt Guard 1: auto_pause_triggered
        Note over WC: Skip — auto-pause already acted
    else Guard 2: not idle
        Note over WC: Skip — worker is active
    else Guard 3: not eligible
        Note over WC: Skip — not eligible for pause
    else Guard 4: running <= min_workers
        Note over WC: Skip — at minimum fleet size
    else Guard 5: cooldown active
        Note over WC: Skip — too soon since last drain
    else All guards pass
        WC->>CP: drain_worker(worker_id, reason=scale_down)

        rect rgb(255, 235, 238)
            Note over CP: DrainWorkerCommand
            CP->>CP: Validate status = RUNNING
            CP->>CP: Set status = DRAINING
            CP->>CP: Set desired_status = STOPPED
            CP->>etcd: PUT /workers/id (status=DRAINING)
        end

        WC->>WC: Update last_scale_down_at
        WC->>WC: Decrement running_worker_count
        WC->>WC: Increment scale_down_count
    end

Safety Guards (5 checks, in order)

The _evaluate_scale_down method applies five sequential guards before draining:

# Guard Condition to Skip Audit Label Purpose
1 Auto-pause auto_pause_triggered is true skipped_auto_pause Avoid double-action with auto-pause
2 Idle check is_idle is false skipped_not_idle Only drain idle workers
3 Eligibility eligible_for_pause is false skipped_not_eligible Respects snooze, per-worker flags
4 Min workers running_count <= min_workers skipped_min_workers Maintain minimum fleet
5 Cooldown elapsed < cooldown_seconds skipped_cooldown Prevent rapid succession drains
flowchart TD
    Start["_evaluate_scale_down"] --> G1{"auto_pause<br/>triggered?"}
    G1 -->|Yes| Skip1["Skip: auto-pause<br/>already handled"]
    G1 -->|No| G2{"is_idle?"}
    G2 -->|No| Skip2["Skip: not idle"]
    G2 -->|Yes| G3{"eligible_for<br/>_pause?"}
    G3 -->|No| Skip3["Skip: not eligible"]
    G3 -->|Yes| G4{"running_count<br/><= min_workers?"}
    G4 -->|Yes| Skip4["Skip: minimum<br/>fleet size"]
    G4 -->|No| G5{"cooldown<br/>active?"}
    G5 -->|Yes| Skip5["Skip: too soon"]
    G5 -->|No| Drain["DRAIN WORKER"]

    style Start fill:#1565C0,color:white
    style Drain fill:#f44336,color:white
    style Skip1 fill:#9E9E9E,color:white
    style Skip2 fill:#9E9E9E,color:white
    style Skip3 fill:#9E9E9E,color:white
    style Skip4 fill:#9E9E9E,color:white
    style Skip5 fill:#9E9E9E,color:white

Running Worker Count Tracking

After a successful drain, the reconciler decrements _running_worker_count locally. This ensures that if multiple idle workers are evaluated in the same reconciliation cycle, the min_workers guard remains accurate without waiting for the next API refresh.


6. DRAINING State (ADR-008)

The DRAINING status provides a graceful transition between RUNNING and STOPPING:

stateDiagram-v2
    direction LR
    RUNNING --> DRAINING: DrainWorkerCommand
    DRAINING --> STOPPING: Worker Controller reconcile
    STOPPING --> STOPPED: EC2 instance stopped

    note right of DRAINING
        Worker accepts no new<br/>lablet assignments.<br/>Active workloads continue<br/>until completion.
    end note

The DrainWorkerCommand handler:

  1. Validates the worker is in RUNNING status (returns 409 Conflict otherwise)
  2. Sets status = DRAINING
  3. Sets desired_status = STOPPED
  4. Records scaling audit event

The Worker Controller's _handle_stopping method handles both STOPPING and DRAINING statuses with the same EC2 stop logic.


7. End-to-End Scale-Up Timeline

sequenceDiagram
    autonumber
    participant User as User / System
    participant RS as Resource Scheduler
    participant CP as Control Plane API
    participant WC as Worker Controller
    participant EC2 as AWS EC2
    participant CML as CML Instance

    User->>CP: Create LabletInstance (PENDING)
    CP->>CP: Set status = PENDING_SCHEDULING

    Note over RS: Watch event or polling cycle

    RS->>RS: PlacementEngine: no capacity
    RS->>CP: RequestScaleUp(template=large)
    CP->>CP: Create worker (PENDING)

    Note over WC: Watch event (~0.5s)

    WC->>EC2: launch_instance(m5zn.metal)
    WC->>CP: Update status (PROVISIONING)

    Note over EC2: EC2 instance booting (~2-5 min)

    WC->>EC2: get_instance_state()
    EC2-->>WC: running + IP
    WC->>CP: Update status (RUNNING, ip_address)

    Note over RS: Next reconciliation cycle

    RS->>RS: PlacementEngine: worker available
    RS->>CP: assign_instance(instance, worker)

    Note over CML: Lablet Controller provisions lab

8. Observability

Scaling Audit Events

All scaling decisions are recorded via record_scaling_event() for observability:

Event Service Description
scale_up_accepted Control Plane API New worker created from template
scale_up_rejected Control Plane API Constraint violation or error
provisioned Worker Controller EC2 instance launched
scale_down_initiated Worker Controller Drain command sent
scale_down_failed Worker Controller Drain command failed
skipped_auto_pause Worker Controller Scale-down skipped (auto-pause acted)
skipped_not_idle Worker Controller Scale-down skipped (worker active)
skipped_not_eligible Worker Controller Scale-down skipped (not eligible)
skipped_min_workers Worker Controller Scale-down skipped (min fleet)
skipped_cooldown Worker Controller Scale-down skipped (cooldown)
drained Worker Controller Worker successfully drained

9. Configuration Reference

Scale-Up Settings (Control Plane API)

Setting Env Variable Default Description
max_workers_per_region MAX_WORKERS_PER_REGION 10 Maximum active workers per AWS region

Scale-Down Settings (Worker Controller)

Setting Env Variable Default Description
scale_down_enabled SCALE_DOWN_ENABLED False Enable automatic scale-down
min_workers MIN_WORKERS 0 Minimum running workers to maintain
scale_down_cooldown_seconds SCALE_DOWN_COOLDOWN_SECONDS 600 Cooldown between drains (10 min)

Scheduler Settings (Resource Scheduler)

Setting Env Variable Default Description
scheduling_interval RECONCILE_INTERVAL 30 Seconds between scheduling cycles
max_retries MAX_RETRIES 35 Max retries for failed placements
scheduling_polling_enabled SCHEDULING_POLLING_ENABLED True Enable polling-based scheduling

Scale-Down Disabled by Default

scale_down_enabled defaults to False. Enable it only after validating idle detection thresholds and min_workers settings for your environment.