CML Worker Domain Model¶
Overview¶
The CML Worker is a domain aggregate that represents an AWS EC2 instance running Cisco Modeling Lab (CML). It encapsulates the complete lifecycle, state management, and operational characteristics of a CML instance.
Domain Structure¶
Entity: CMLWorker¶
Location: src/domain/entities/cml_worker.py
The CMLWorker aggregate follows the Neuroglia AggregateState pattern with event sourcing.
Key Attributes¶
Identity & Infrastructure
id: Unique worker identifier (UUID)name: Human-readable nameaws_region: AWS region (e.g., "us-east-1")aws_instance_id: EC2 instance IDinstance_type: EC2 instance type (e.g., "t3.xlarge")ami_id: AWS AMI ID to create instance from (e.g., "ami-0123456789abcdef0")ami_name: Human-readable AMI name (e.g., "CML-2.7.0-Ubuntu-22.04")
State Management
status: EC2 instance status (CMLWorkerStatus enum)service_status: CML HTTPS service availability (CMLServiceStatus enum)
CML Integration
cml_version: Installed CML versionlicense_status: License registration status (LicenseStatus enum)license_token: License registration keyhttps_endpoint: HTTPS URL for CML access
Networking
public_ip: Public IP addressprivate_ip: Private IP address
Telemetry & Monitoring
last_activity_at: Last detected activity timestampactive_labs_count: Number of active labscpu_utilization: CPU usage percentage (0-100)memory_utilization: Memory usage percentage (0-100)
Audit Trail
created_at: Creation timestampupdated_at: Last update timestampterminated_at: Termination timestampcreated_by: User ID who created the workerterminated_by: User ID who terminated the worker
Enums¶
Location: src/domain/enums.py
CMLWorkerStatus¶
Represents EC2 instance states:
PENDING: Instance being launchedRUNNING: Instance is runningSTOPPING: Instance being stoppedSTOPPED: Instance is stoppedSHUTTING_DOWN: Instance being terminatedTERMINATED: Instance is terminatedUNKNOWN: Status cannot be determined
CMLServiceStatus¶
Represents CML HTTPS service availability:
UNAVAILABLE: Service not accessibleSTARTING: Service starting upAVAILABLE: Service accessible via HTTPSERROR: Service encountered an error
LicenseStatus¶
Represents CML license state:
UNREGISTERED: No license registeredREGISTERED: Valid license activeEVALUATION: Evaluation license activeEXPIRED: License has expiredINVALID: License is invalid
Domain Events¶
Location: src/domain/events/cml_worker.py
All events are decorated with @cloudevent for CloudEvents compliance.
CMLWorkerCreatedDomainEvent¶
Raised when a new worker is created.
Fields: aggregate_id, name, aws_region, aws_instance_id, instance_type, ami_id, ami_name, status, cml_version, created_at, created_by
CMLWorkerStatusUpdatedDomainEvent¶
Raised when EC2 instance status changes.
Fields: aggregate_id, old_status, new_status, updated_at
CMLServiceStatusUpdatedDomainEvent¶
Raised when CML HTTPS service status changes.
Fields: aggregate_id, old_service_status, new_service_status, https_endpoint, updated_at
CMLWorkerInstanceAssignedDomainEvent¶
Raised when AWS EC2 instance ID is assigned.
Fields: aggregate_id, aws_instance_id, public_ip, private_ip, assigned_at
CMLWorkerLicenseUpdatedDomainEvent¶
Raised when license status is updated.
Fields: aggregate_id, license_status, license_token, updated_at
CMLWorkerTelemetryUpdatedDomainEvent¶
Raised when telemetry data is collected.
Fields: aggregate_id, last_activity_at, active_labs_count, cpu_utilization, memory_utilization, updated_at
CMLWorkerEndpointUpdatedDomainEvent¶
Raised when HTTPS endpoint is updated.
Fields: aggregate_id, https_endpoint, public_ip, updated_at
CMLWorkerTerminatedDomainEvent¶
Raised when worker is terminated.
Fields: aggregate_id, name, terminated_at, terminated_by
Repository Interface¶
Location: src/domain/repositories/cml_worker_repository.py
Query Methods¶
async def get_all_async() -> list[CMLWorker]
async def get_by_id_async(worker_id: str) -> CMLWorker | None
async def get_by_aws_instance_id_async(aws_instance_id: str) -> CMLWorker | None
async def get_by_status_async(status: CMLWorkerStatus) -> list[CMLWorker]
async def get_active_workers_async() -> list[CMLWorker]
async def get_idle_workers_async(idle_threshold_minutes: int) -> list[CMLWorker]
Command Methods¶
async def add_async(entity: CMLWorker) -> CMLWorker
async def update_async(entity: CMLWorker) -> CMLWorker
async def delete_async(worker_id: str, worker: Optional[CMLWorker] = None) -> bool
Aggregate Behavior¶
Lifecycle Operations¶
update_status(new_status: CMLWorkerStatus) -> bool¶
Updates EC2 instance status. Returns False if status unchanged.
update_service_status(new_service_status: CMLServiceStatus, https_endpoint: Optional[str]) -> bool¶
Updates CML service availability and endpoint. Returns False if unchanged.
assign_instance(aws_instance_id: str, public_ip: Optional[str], private_ip: Optional[str]) -> None¶
Assigns AWS EC2 instance details to the worker.
terminate(terminated_by: Optional[str]) -> None¶
Marks worker as terminated.
License Management¶
update_license(license_status: LicenseStatus, license_token: Optional[str]) -> bool¶
Updates CML license status. Returns False if unchanged.
Telemetry & Monitoring¶
update_telemetry(last_activity_at: datetime, active_labs_count: int, cpu_utilization: Optional[float], memory_utilization: Optional[float]) -> None¶
Updates worker telemetry data.
is_idle(idle_threshold_minutes: int) -> bool¶
Checks if worker has been idle beyond threshold. Returns True if idle.
Idle Criteria:
- Has
last_activity_attimestamp active_labs_count == 0- Time since last activity exceeds threshold
Connectivity¶
update_endpoint(https_endpoint: Optional[str], public_ip: Optional[str]) -> bool¶
Updates HTTPS endpoint and optionally public IP. Returns False if unchanged.
can_connect() -> bool¶
Checks if worker is ready for connections.
Connection Ready Criteria:
- Status is
RUNNING - Service status is
AVAILABLE - HTTPS endpoint is set
Usage Patterns¶
Creating a Worker¶
worker = CMLWorker(
name="Dev Lab Worker 1",
aws_region="us-east-1",
instance_type="c5.2xlarge",
ami_id="ami-0123456789abcdef0",
ami_name="CML-2.7.0-Ubuntu-22.04",
cml_version="2.7.0",
created_by="user_123"
)
Updating Worker State¶
# Update EC2 status
worker.update_status(CMLWorkerStatus.RUNNING)
# Assign instance details after provisioning
worker.assign_instance(
aws_instance_id="i-1234567890abcdef0",
public_ip="54.123.45.67",
private_ip="10.0.1.100"
)
# Update service availability
worker.update_service_status(
CMLServiceStatus.AVAILABLE,
https_endpoint="https://54.123.45.67"
)
Monitoring & Automation¶
# Update telemetry
worker.update_telemetry(
last_activity_at=datetime.now(timezone.utc),
active_labs_count=2,
cpu_utilization=45.5,
memory_utilization=62.3
)
# Check if idle
if worker.is_idle(idle_threshold_minutes=30):
# Trigger auto-shutdown
worker.update_status(CMLWorkerStatus.STOPPING)
User Connection Flow¶
# Check if ready for connections
if worker.can_connect():
endpoint = worker.state.https_endpoint
# Direct user to endpoint
else:
# Show status and wait
pass
Integration Points¶
Application Layer¶
- Commands: Create, update, start, stop, terminate workers
- Queries: Get worker status, list available workers, find idle workers
- Event Handlers: React to status changes, trigger notifications
Infrastructure Layer¶
- AWS Integration: EC2 instance management via boto3
- CML API Client: License management, telemetry collection
- Monitoring: CloudWatch metrics, health checks
Repository Implementations¶
- In-Memory: For testing and development
- MongoDB: For production persistence
- Event Store: For event sourcing (if required)
Design Decisions¶
Why AggregateState Pattern?¶
- Event Sourcing: Complete audit trail of all state changes
- Domain Events: Clean integration with event-driven architecture
- CQRS: Natural separation of commands and queries
Why Separate Status Enums?¶
- EC2 Status: Represents infrastructure state (AWS concern)
- Service Status: Represents application availability (CML concern)
- Clear Semantics: Different lifecycle management per concern
Telemetry as Aggregate State¶
- Decision Point: Telemetry stored in aggregate, not separate entity
- Rationale: Telemetry drives idle detection and lifecycle decisions
- Trade-off: More frequent updates, but simpler model
Future Enhancements¶
Potential Extensions¶
- Lab Assignments
- Add
LabAssignmentvalue object - Track which users have access to which labs
-
Enforce RBAC at domain level
-
Cost Tracking
- Add
cost_per_hourattribute - Calculate running costs
-
Cost-based shutdown policies
-
Maintenance Windows
- Add
maintenance_schedulevalue object - Prevent auto-shutdown during maintenance
-
Schedule patches and updates
-
Multi-Region Support
- Add region-specific configuration
- Cross-region failover
-
Region affinity for users
-
Snapshot Management
- Track EBS snapshots
- Backup policies
- Restore capabilities