Resource ModelΒΆ
The platform is a timed-resource management plane. Every operable thing β a session, a part of a session, a pod, a host β is a
TimedResourcewith a declared desired state and an observed actual state, reconciled continuously (Kubernetes-style). This page defines the abstraction, the runtime tree, and the reconciliation framework. See the glossary for terms and ADR-036 for the layered state design.
Why one patternΒΆ
Sessions, pods, and hosts used to be modelled as unrelated aggregates with bespoke state machines. They actually share the same shape: a scheduled window, a lifecycle of phases, a desired vs actual status, and an audit trail of transitions. Collapsing them onto one abstraction means a single reconciliation engine, a single dashboard, and uniform behaviour for provisioning, gating, failure handling, and teardown at every level.
Definition vs instanceΒΆ
Every runtime resource is created from a definition. The platform is therefore two parallel
families β a catalogue of definitions and a runtime of instances β and the same duality
that already exists in the legacy .NET domains (SessionType β Session, PodDefinition β Pod,
Track β β¦ β Form) is generalised onto one base each. See
ADR-050.
- A
ResourceDefinitionis type metadata: the stable spec of a kind of thing β its identity (type_key),version, the lifecycle template its instances run, the selectors / requirements for its children, and an optional authorization policy. Definitions are the catalogue; they do not reconcile against infrastructure. - A
ResourceInstanceis a running thing: it carriesstatus+desired_status, astate_history, an instantiatedlifecycle, anowner_id, and children. Its behaviour is to reconcile its actual state toward its desired state.
classDiagram
class ResourceDefinition {
+str type_key
+str version
+str provisioning_source
+str~None authorization_policy_id
+dict lifecycle_template
+list child_requirements
+str definition_status
+str~None sync_status
}
class ResourceInstance {
+str id
+str resource_type
+str definition_ref
+str status
+str~None desired_status
+str owner_id
+list state_history
+reconcile()
}
class TimedResource {
+dict timeslot
+dict lifecycle
}
ResourceDefinition ..> ResourceInstance : instantiates
ResourceInstance <|-- TimedResource
ResourceInstance <|.. Job : untimed
ResourceInstance <|.. Report : untimed
TimedResource <|.. Session
TimedResource <|.. SessionPart
TimedResource <|.. PodInstance
TimedResource <|.. Host
Two provisioning sourcesΒΆ
A definition's provisioning_source declares where it comes from, and that determines whether
it has a lifecycle of its own (ADR-051):
seedβ static catalogue/config loaded by a seeder from YAML assets (session types, pod definitions, locations, delivery environments, authorization policies). No lifecycle, nosync_status; immutable until re-seeded.content_packageβ authored content synced from a PAv1 package (the content-authoring taxonomy + per-part jobs/reports). Carries adefinition_statusand a reconciledsync_status.
This is the only asymmetry in the model: seed definitions are inert reference data; only content_package definitions reconcile (they are synced, not provisioned). The full table, state diagram and ownership live in definition-catalog-model.md (canonical).
Timed vs untimed instancesΒΆ
Not every instance owns a window. The instance base splits in two tiers (ADR-050):
| Tier | Class | Adds | Examples |
|---|---|---|---|
| L1 | ResourceInstance |
lifecycle + desired_status reconciliation, no Timeslot of its own (inherits its parent's window). |
Job, Report (SE automation outputs). |
| L2 | TimedResource |
a Timeslot (scheduled window, lead_time, teardown buffer). |
Session, SessionPart, PodInstance, Host / Worker. |
Catalogue β runtime mapΒΆ
| Definition | β Instance | Tier | Source |
|---|---|---|---|
SessionDefinition (session type) |
Session |
Timed | seed |
PartDefinition (requirement + form selector) |
SessionPart |
Timed | seed |
PodDefinition |
PodInstance |
Timed | seed / content_package |
HostDefinition (hosting-site / rack) |
Host / Worker |
Timed | seed |
Form (the synced leaf of Track β Exam β Module β Formset β Form β FormItem) |
delivered by SessionPart β no timed instance; the Form itself is a synced definition-plane resource |
β | content_package |
JobDefinition |
Job |
Untimed | content_package |
ReportDefinition |
Report |
Untimed | content_package |
DeliveryEnvironment / LabLocation / HostingSiteLocation / AuthorizationPolicy |
config β no instance | β | seed |
Device / DeviceDefinition |
deferred β out of scope this round | β | β |
Formis the one synced unit, not an inert leaf (ADR-059). Across thecontent_packagetaxonomy, only theFormreconciles β it owns async_status, the synced content bytes (RustFS), and an optionalPodDefinitionref. It generalises the legacyLabletDefinition. It lives on the catalogue / sync plane, not the timed runtime tree: theSessionPartis still the timed delivery, so there is no separate timed Form instance. Theform-controllerowns its sync loop (see ADR-054).
The layered state abstractionΒΆ
Three layers, defined once in lcm_core and reused everywhere (ADR-036):
| Layer | Class | Adds |
|---|---|---|
| 1 | ResourceState |
spec/status (status, desired_status), owner_id, state_history, pipeline_progress, timestamps, _record_transition(). |
| 2 | TimedResourceState |
timeslot, lifecycle, started_at/ended_at/duration, terminated_at + VO accessors. |
| 3 | Concrete | SessionState, SessionPartState, PodInstanceState, HostState (and profiles LabletSessionState, CMLWorkerState, LabRecordState). |
State vs aggregate. These are the state classes (Neuroglia
AggregateState). Their matching aggregates areResourceInstance(holdsResourceState) andTimedResource(holdsTimedResourceState). Untimed instances (Job,Report) stop at layer 1 β they hold aResourceStateonly, with noTimeslot.
classDiagram
class AggregateState~str~ {
<<Neuroglia>>
}
class ResourceState {
+str id
+str resource_type
+str status
+str~None desired_status
+str owner_id
+list~StateTransition state_history
+dict~None pipeline_progress
+datetime created_at
+datetime updated_at
+_record_transition(from, to, by, reason)
}
class TimedResourceState {
+dict~None timeslot
+dict~None lifecycle
+datetime~None started_at
+datetime~None ended_at
+float~None duration_seconds
+datetime~None terminated_at
+get_timeslot() Timeslot
+set_timeslot(t)
+get_lifecycle() ManagedLifecycle
+set_lifecycle(l)
+_compute_duration()
}
class SessionState {
+str session_definition_ref
+list~str part_ids
}
class SessionPartState {
+str session_id
+str part_definition_ref
+list~str pod_ids
+int order
}
class PodInstanceState {
+str part_id
+str pod_definition_ref
+str pod_type
+str host_id
}
class HostState {
+str host_type
+list~str pod_ids
+int capacity
}
AggregateState~str~ <|-- ResourceState
ResourceState <|-- TimedResourceState
TimedResourceState <|-- SessionState
TimedResourceState <|-- SessionPartState
TimedResourceState <|-- PodInstanceState
TimedResourceState <|-- HostState
SessionState "1" o-- "0..N" SessionPartState : owns
SessionPartState "1" o-- "0..N" PodInstanceState : owns
PodInstanceState "N" --> "1" HostState : binds
Profiles, not new types
LabletSession is simply a Session with one part and one cml_on_aws pod; LabRecord
is the cml_on_aws PodInstance; CmlWorker is the cml_on_aws Host. They are
specialisations, not parallel hierarchies. See session-model.md.
Value objectsΒΆ
timeslot and lifecycle are stored as dicts on the state (Neuroglia serialization) and
accessed as value objects from lcm_core.
classDiagram
class TimedResourceState {
+dict timeslot
+dict lifecycle
+list state_history
}
class Timeslot {
+datetime start
+datetime end
+timedelta lead_time
+timedelta teardown_buffer
+provision_at() datetime
+cleanup_deadline() datetime
+duration() timedelta
+is_active() bool
+is_expired() bool
+extend(delta)
}
class ManagedLifecycle {
+tuple~LifecyclePhase phases
+str current_phase
+get_phase(name) LifecyclePhase
+get_active_phases() list
+phase_names() list
}
class LifecyclePhase {
+str name
+str engine
+str trigger_on_status
+dict pipeline_def
+str workflow_ref
+bool is_required
}
class StateTransition {
+str from_state
+str to_state
+datetime transitioned_at
+str triggered_by
+str reason
+dict metadata
}
TimedResourceState "1" *-- "0..1" Timeslot : timeslot
TimedResourceState "1" *-- "0..1" ManagedLifecycle : lifecycle
TimedResourceState "1" *-- "0..N" StateTransition : state_history
ManagedLifecycle "1" *-- "1..N" LifecyclePhase : phases
Timeslotβ the scheduled window.lead_timeis the knob that distinguishes JIT provisioning (short lead, e.g. a CML pod) from eager provisioning (long lead / pre-booked, e.g. CCIE hardware appliances).provision_at = start - lead_time.ManagedLifecycleβ the ordered phases for this resource type. EachLifecyclePhasebinds to an engine:pipeline(native LCM steps) orworkflow(an SE job). This is the one seam where a resource's lifecycle hands work to content-driven automation.StateTransitionβ every status change is appended tostate_historyfor audit and for the dashboard's timeline tab.
The runtime treeΒΆ
Session ββ¬β SessionPart ββ¬β PodInstance ββ (binds) ββ Host / Worker
β ββ PodInstance
ββ SessionPart βββ PodInstance
| Resource | Reconciled by | Notes |
|---|---|---|
Session |
session-controller | Runs an orchestration lifecycle (admit β run_parts β aggregate β finalize) driven by a part_execution policy; thin pending β active β inactive is the degenerate single-part/no-gate case. |
SessionPart |
session-controller | First-class resource; own timeslot + lifecycle; 0..N pods. |
PodInstance |
pod-controller | Any PodType; instantiated from a PodDefinition. |
Host / Worker |
host-controller (+ adapters) | Any HostType; pods bind to it. |
Intent (desired_status) cascades down the tree; observed status bubbles up. The
Reconciled by column names the target resource-kind controllers of
ADR-054; see the as-built mapping under the
manager registry below. CPA remains the sole writer of the resource store
(ADR-001) β controllers observe via etcd watch
and persist their reconcile results through CPA, never writing the store directly.
Generic reconciliation frameworkΒΆ
One control loop, specialised per resource kind (see ADR-047). The loop is observe β diff β act β record; a per-type manager supplies the kind-specific logic.
sequenceDiagram
autonumber
participant Op as Operator / Scheduler
participant RL as Generic Reconcile Loop
participant Mgr as Per-Type Manager
participant Store as Resource Store
participant Infra as Adapter / SE
Op->>Store: set desired_status (intent)
loop every reconcile tick
RL->>Store: observe (load resource + children)
RL->>Mgr: reconcile(resource)
Mgr->>Mgr: diff(desired_status, status)
alt converged
Mgr-->>RL: no-op
else action needed
Mgr->>Infra: act (provision / phase step / teardown)
Infra-->>Mgr: result
Mgr->>Store: record StateTransition + update status
Mgr->>Store: cascade desired_status to children
end
end
Store-->>Op: status bubbles up (SSE)
Manager registry β each resource kind registers a manager against the loop:
| Manager | Resource kind | Home |
|---|---|---|
| Session manager | Session |
session-controller |
| Part manager | SessionPart |
session-controller |
| Pod manager | PodInstance |
pod-controller |
| Host manager | Host / Worker |
host-controller |
| Job manager | Job (untimed) |
scenario-engine |
| Report manager | Report (untimed) |
scenario-engine |
Home column β target topology vs as-built
The Home column names the target resource-kind controllers of ADR-054. As-built today the same reconcile logic runs in fewer services:
| Resource kind | Target controller | As-built home |
|---|---|---|
Session + SessionPart |
session-controller |
CPA / the session half of lablet-controller |
PodInstance |
pod-controller |
lablet-controller |
Host / Worker |
host-controller |
worker-controller |
Job + Report |
scenario-engine |
scenario-engine (unchanged) |
Across both topologies CPA is the only writer of the resource store
(ADR-001): controllers reconcile from an
etcd watch and persist status / cascaded desired_status through CPA (intent down,
status up).
The Session and Part managers also drive per-part content automation by transitioning a
workflow phase, which submits an SE job and waits for the CloudEvent result.
Resource lifecycle (states)ΒΆ
Each resource type defines its own phase names, but they share the same shape: a scheduled
window, provisioning gated on provision_at, an active window, optional grading, then teardown.
Failure is bounded-retry then escalate to the parent.
stateDiagram-v2
[*] --> Scheduled : created (timeslot set)
Scheduled --> Provisioning : provision_at reached (lead_time)
Provisioning --> Ready : init phases complete
Ready --> Active : timeslot.start
Active --> Grading : grade trigger (event / operator)
Grading --> Active : grade complete
Active --> TearingDown : timeslot.end or desired=TornDown
Grading --> TearingDown : final grade
TearingDown --> Terminated : cleanup complete
Terminated --> [*]
Provisioning --> Failed : bounded retries exhausted
Active --> Failed : reconcile error
Failed --> TearingDown : escalate to parent / cleanup
Session vs part. The state shape above applies to the resources LCM provisions β
SessionPart,PodInstance,Host. TheSessionruns a different, orchestration lifecycle (admit β run_parts β aggregate β finalize) driven by itspart_executionpolicy β it sequences and gates parts and rolls their status up rather than provisioning infrastructure. The thinpending β active β inactiveis just the degenerate single-part, no-gate case (see session-model.md).
TriggersΒΆ
Reconciliation is driven by four trigger sources (all resource kinds):
| Trigger | Example |
|---|---|
| Schedule / timeslot | provision_at reached β start provisioning; end reached β teardown. |
| Operator (UI / API) | Operator sets desired_status, forces a reconcile, or retries a phase. |
| External event | Student submits β grade trigger; webhook from an external system. |
| Inter-resource | A child becomes Ready, or a sibling part completes β next part may start. |
Failure handlingΒΆ
When a reconcile step fails, the responsible manager retries with a bounded policy. On
exhaustion the resource is marked Failed and the failure is escalated to its parent, which
decides whether to abort the subtree or continue. There is no silent infinite retry; a resource
cannot outlive its Timeslot.cleanup_deadline.
Provisioning strategyΒΆ
JIT vs eager is not a separate field β it falls out of Timeslot.lead_time:
- JIT (short lead): CML pods are provisioned shortly before
start. - Eager (long lead / pre-booked): hardware-backed pods (e.g. CCIE) carry a large
lead_time, soprovision_atmay fall inside an earlier part's active window β the tree allows a later part's pod to provision early while parts remain sequential and gated.
RelatedΒΆ
- session-model.md β definitions, selectors, and session profiles.
- unified-resource-management.md β resource lifecycle (LCM) vs job automation (SE).
- ui-resource-dashboard.md β the K8s-style dashboard.
- ADRs: 036 (layers), 046 (host / type split), 047 (reconciliation).