๐ฏ Resource Oriented Architecture (ROA)ยถ
Resource Oriented Architecture is a powerful pattern for building systems that manage resources through their lifecycle, similar to how Kubernetes manages cluster resources. Neuroglia provides comprehensive support for ROA patterns including watchers, controllers, and reconciliation loops.
๐ฏ Overviewยถ
ROA provides:
- ๐ Resource Management: Declarative resource definitions with desired vs actual state
- ๐ Watchers: Continuous monitoring of resource changes through polling or event streams
- ๐ฎ Controllers: Business logic that responds to resource changes and implements state transitions
- ๐ Reconciliation: Periodic loops that ensure system consistency and handle drift detection
- ๐ก๏ธ Safety Mechanisms: Timeout handling, error recovery, and corrective actions
๐๏ธ Architecture Overviewยถ
graph TB
subgraph "๐ Resource Layer"
A[Resource Definition]
B[Resource Storage]
C[Resource Events]
end
subgraph "๐ Observation Layer"
D[Watcher] --> E[Event Stream]
F[Poller] --> G[Change Detection]
end
subgraph "๐ฎ Control Layer"
H[Controller] --> I[Business Logic]
I --> J[State Transitions]
I --> K[Action Execution]
end
subgraph "๐ Reconciliation Layer"
L[Reconciliation Loop] --> M[Drift Detection]
M --> N[Corrective Actions]
N --> O[State Restoration]
end
subgraph "๐ก๏ธ Safety Layer"
P[Error Handling] --> Q[Retry Logic]
Q --> R[Circuit Breaker]
R --> S[Timeout Management]
end
A --> B
B --> C
C --> D
C --> F
E --> H
G --> H
H --> L
L --> P
style A fill:#e3f2fd
style H fill:#f3e5f5
style L fill:#e8f5e8
style P fill:#fff3e0
๐๏ธ Core Componentsยถ
Resource Definitionยถ
Resources are declarative objects that define desired state:
@dataclass
class LabInstanceResource:
api_version: str = "lab.neuroglia.com/v1"
kind: str = "LabInstance"
metadata: Dict[str, Any] = None
spec: Dict[str, Any] = None # Desired state
status: Dict[str, Any] = None # Current state
Watcher Patternยถ
Watchers continuously monitor resources for changes:
class LabInstanceWatcher:
async def start_watching(self):
while self.is_running:
# Poll for changes since last known version
changes = self.storage.list_resources(since_version=self.last_resource_version)
for resource in changes:
await self._handle_resource_change(resource)
await asyncio.sleep(self.poll_interval)
Controller Patternยถ
Controllers respond to resource changes with business logic:
class LabInstanceController:
async def handle_resource_event(self, resource: LabInstanceResource):
current_state = resource.status.get('state')
if current_state == ResourceState.PENDING.value:
await self._start_provisioning(resource)
elif current_state == ResourceState.PROVISIONING.value:
await self._check_provisioning_status(resource)
Reconciliation Loopยถ
Reconcilers ensure eventual consistency:
class LabInstanceScheduler:
async def start_reconciliation(self):
while self.is_running:
await self._reconcile_all_resources()
await asyncio.sleep(self.reconcile_interval)
async def _reconcile_resource(self, resource):
# Check for stuck states, timeouts, and drift
# Take corrective actions as needed
๐ Key Patternsยถ
1. Declarative State Managementยถ
Resources define what should exist, not how to create it:
# Desired state (spec)
spec = {
'template': 'python-basics',
'duration': '60m',
'studentEmail': 'student@example.com'
}
# Current state (status)
status = {
'state': 'ready',
'endpoint': 'https://lab-instance.example.com',
'readyAt': '2025-09-09T21:34:19Z'
}
2. Event-Driven Processingยถ
Watchers detect changes and notify controllers immediately:
Resource Change โ Watcher Detection โ Controller Response โ State Update
3. Asynchronous Reconciliationยถ
Controllers handle immediate responses while reconcilers provide safety:
# Controller: Immediate response to events
async def handle_resource_event(self, resource):
if resource.state == PENDING:
await self.start_provisioning(resource)
# Reconciler: Periodic safety checks
async def reconcile_resource(self, resource):
if self.is_stuck_provisioning(resource):
await self.mark_as_failed(resource)
4. State Machine Implementationยถ
Resources progress through well-defined states:
PENDING โ PROVISIONING โ READY โ (cleanup) โ DELETING โ DELETED
โ โ
FAILED โ FAILED
โก Execution Modelยถ
Timing and Coordinationยถ
- Watchers: Poll every 2-5 seconds for near-real-time responsiveness
- Controllers: Respond immediately to detected changes
- Reconcilers: Run every 10-30 seconds for consistency checks
Concurrent Processingยถ
All components run concurrently:
async def main():
# Start all components concurrently
watcher_task = asyncio.create_task(watcher.start_watching())
scheduler_task = asyncio.create_task(scheduler.start_reconciliation())
# Controllers are event-driven (no separate task needed)
watcher.add_event_handler(controller.handle_resource_event)
๐ก๏ธ Safety and Reliabilityยถ
Timeout Handlingยถ
Reconcilers detect and handle stuck states:
if resource.state == PROVISIONING and age > timeout_threshold:
await self.mark_as_failed(resource, "Provisioning timeout")
Error Recoveryยถ
Controllers and reconcilers implement retry logic:
try:
await self.provision_lab_instance(resource)
except Exception as e:
resource.status['retries'] = resource.status.get('retries', 0) + 1
if resource.status['retries'] < max_retries:
await self.schedule_retry(resource)
else:
await self.mark_as_failed(resource, str(e))
Drift Detectionยถ
Reconcilers verify that actual state matches desired state:
async def check_drift(self, resource):
actual_state = await self.get_actual_infrastructure_state(resource)
desired_state = resource.spec
if actual_state != desired_state:
await self.correct_drift(resource, actual_state, desired_state)
๐ Observabilityยถ
Metrics and Loggingยถ
ROA components provide rich observability:
logger.info(f"๐ Watcher detected change: {resource_id} -> {state}")
logger.info(f"๐ฎ Controller processing: {resource_id} (state: {state})")
logger.info(f"๐ Reconciling {len(resources)} resources")
logger.warning(f"โ ๏ธ Reconciler: Resource stuck: {resource_id}")
Resource Versioningยถ
Track changes with resource versions:
resource.metadata['resourceVersion'] = str(self.next_version())
resource.metadata['lastModified'] = datetime.now(timezone.utc).isoformat()
๐ง Configurationยถ
Tuning Parametersยถ
Adjust timing for your use case:
# Development: Fast feedback
watcher = LabInstanceWatcher(storage, poll_interval=1.0)
scheduler = LabInstanceScheduler(storage, reconcile_interval=5.0)
# Production: Balanced performance
watcher = LabInstanceWatcher(storage, poll_interval=5.0)
scheduler = LabInstanceScheduler(storage, reconcile_interval=30.0)
Scaling Considerationsยถ
- Multiple Watchers: Use resource sharding for scale
- Controller Parallelism: Process multiple resources concurrently
- Reconciler Batching: Group operations for efficiency
๐ฏ Use Casesยถ
ROA is ideal for:
- Infrastructure Management: Cloud resources, containers, services
- Workflow Orchestration: Multi-step processes with dependencies
- Resource Lifecycle: Provisioning, monitoring, cleanup
- System Integration: Managing external system state
- DevOps Automation: CI/CD pipelines, deployment management
๐ Related Documentationยถ
- ๐๏ธ Watcher & Reconciliation Patterns - Detailed pattern explanations
- โก Execution Flow - How components coordinate
- ๐งช Lab Resource Manager Sample - Complete ROA implementation
- ๐ฏ CQRS & Mediation - Command/Query patterns used in ROA
- ๐๏ธ Data Access - Repository patterns for resource storage