Skip to content

๐ŸŽฏ Resource Oriented Architecture (ROA)ยถ

Resource Oriented Architecture is a powerful pattern for building systems that manage resources through their lifecycle, similar to how Kubernetes manages cluster resources. Neuroglia provides comprehensive support for ROA patterns including watchers, controllers, and reconciliation loops.

๐ŸŽฏ Overviewยถ

ROA provides:

  • ๐Ÿ“Š Resource Management: Declarative resource definitions with desired vs actual state
  • ๐Ÿ‘€ Watchers: Continuous monitoring of resource changes through polling or event streams
  • ๐ŸŽฎ Controllers: Business logic that responds to resource changes and implements state transitions
  • ๐Ÿ”„ Reconciliation: Periodic loops that ensure system consistency and handle drift detection
  • ๐Ÿ›ก๏ธ Safety Mechanisms: Timeout handling, error recovery, and corrective actions

๐Ÿ—๏ธ Architecture Overviewยถ

graph TB
    subgraph "๐Ÿ“Š Resource Layer"
        A[Resource Definition]
        B[Resource Storage]
        C[Resource Events]
    end

    subgraph "๐Ÿ‘€ Observation Layer"
        D[Watcher] --> E[Event Stream]
        F[Poller] --> G[Change Detection]
    end

    subgraph "๐ŸŽฎ Control Layer"
        H[Controller] --> I[Business Logic]
        I --> J[State Transitions]
        I --> K[Action Execution]
    end

    subgraph "๐Ÿ”„ Reconciliation Layer"
        L[Reconciliation Loop] --> M[Drift Detection]
        M --> N[Corrective Actions]
        N --> O[State Restoration]
    end

    subgraph "๐Ÿ›ก๏ธ Safety Layer"
        P[Error Handling] --> Q[Retry Logic]
        Q --> R[Circuit Breaker]
        R --> S[Timeout Management]
    end

    A --> B
    B --> C
    C --> D
    C --> F
    E --> H
    G --> H
    H --> L
    L --> P

    style A fill:#e3f2fd
    style H fill:#f3e5f5
    style L fill:#e8f5e8
    style P fill:#fff3e0

๐Ÿ—๏ธ Core Componentsยถ

Resource Definitionยถ

Resources are declarative objects that define desired state:

@dataclass
class LabInstanceResource:
    api_version: str = "lab.neuroglia.com/v1"
    kind: str = "LabInstance"
    metadata: Dict[str, Any] = None
    spec: Dict[str, Any] = None      # Desired state
    status: Dict[str, Any] = None    # Current state

Watcher Patternยถ

Watchers continuously monitor resources for changes:

class LabInstanceWatcher:
    async def start_watching(self):
        while self.is_running:
            # Poll for changes since last known version
            changes = self.storage.list_resources(since_version=self.last_resource_version)

            for resource in changes:
                await self._handle_resource_change(resource)

            await asyncio.sleep(self.poll_interval)

Controller Patternยถ

Controllers respond to resource changes with business logic:

class LabInstanceController:
    async def handle_resource_event(self, resource: LabInstanceResource):
        current_state = resource.status.get('state')

        if current_state == ResourceState.PENDING.value:
            await self._start_provisioning(resource)
        elif current_state == ResourceState.PROVISIONING.value:
            await self._check_provisioning_status(resource)

Reconciliation Loopยถ

Reconcilers ensure eventual consistency:

class LabInstanceScheduler:
    async def start_reconciliation(self):
        while self.is_running:
            await self._reconcile_all_resources()
            await asyncio.sleep(self.reconcile_interval)

    async def _reconcile_resource(self, resource):
        # Check for stuck states, timeouts, and drift
        # Take corrective actions as needed

๐Ÿš€ Key Patternsยถ

1. Declarative State Managementยถ

Resources define what should exist, not how to create it:

# Desired state (spec)
spec = {
    'template': 'python-basics',
    'duration': '60m',
    'studentEmail': 'student@example.com'
}

# Current state (status)
status = {
    'state': 'ready',
    'endpoint': 'https://lab-instance.example.com',
    'readyAt': '2025-09-09T21:34:19Z'
}

2. Event-Driven Processingยถ

Watchers detect changes and notify controllers immediately:

Resource Change โ†’ Watcher Detection โ†’ Controller Response โ†’ State Update

3. Asynchronous Reconciliationยถ

Controllers handle immediate responses while reconcilers provide safety:

# Controller: Immediate response to events
async def handle_resource_event(self, resource):
    if resource.state == PENDING:
        await self.start_provisioning(resource)

# Reconciler: Periodic safety checks
async def reconcile_resource(self, resource):
    if self.is_stuck_provisioning(resource):
        await self.mark_as_failed(resource)

4. State Machine Implementationยถ

Resources progress through well-defined states:

PENDING โ†’ PROVISIONING โ†’ READY โ†’ (cleanup) โ†’ DELETING โ†’ DELETED
    โ†“                      โ†“
  FAILED              โ† FAILED

โšก Execution Modelยถ

Timing and Coordinationยถ

  • Watchers: Poll every 2-5 seconds for near-real-time responsiveness
  • Controllers: Respond immediately to detected changes
  • Reconcilers: Run every 10-30 seconds for consistency checks

Concurrent Processingยถ

All components run concurrently:

async def main():
    # Start all components concurrently
    watcher_task = asyncio.create_task(watcher.start_watching())
    scheduler_task = asyncio.create_task(scheduler.start_reconciliation())

    # Controllers are event-driven (no separate task needed)
    watcher.add_event_handler(controller.handle_resource_event)

๐Ÿ›ก๏ธ Safety and Reliabilityยถ

Timeout Handlingยถ

Reconcilers detect and handle stuck states:

if resource.state == PROVISIONING and age > timeout_threshold:
    await self.mark_as_failed(resource, "Provisioning timeout")

Error Recoveryยถ

Controllers and reconcilers implement retry logic:

try:
    await self.provision_lab_instance(resource)
except Exception as e:
    resource.status['retries'] = resource.status.get('retries', 0) + 1
    if resource.status['retries'] < max_retries:
        await self.schedule_retry(resource)
    else:
        await self.mark_as_failed(resource, str(e))

Drift Detectionยถ

Reconcilers verify that actual state matches desired state:

async def check_drift(self, resource):
    actual_state = await self.get_actual_infrastructure_state(resource)
    desired_state = resource.spec

    if actual_state != desired_state:
        await self.correct_drift(resource, actual_state, desired_state)

๐Ÿ“Š Observabilityยถ

Metrics and Loggingยถ

ROA components provide rich observability:

logger.info(f"๐Ÿ” Watcher detected change: {resource_id} -> {state}")
logger.info(f"๐ŸŽฎ Controller processing: {resource_id} (state: {state})")
logger.info(f"๐Ÿ”„ Reconciling {len(resources)} resources")
logger.warning(f"โš ๏ธ Reconciler: Resource stuck: {resource_id}")

Resource Versioningยถ

Track changes with resource versions:

resource.metadata['resourceVersion'] = str(self.next_version())
resource.metadata['lastModified'] = datetime.now(timezone.utc).isoformat()

๐Ÿ”ง Configurationยถ

Tuning Parametersยถ

Adjust timing for your use case:

# Development: Fast feedback
watcher = LabInstanceWatcher(storage, poll_interval=1.0)
scheduler = LabInstanceScheduler(storage, reconcile_interval=5.0)

# Production: Balanced performance
watcher = LabInstanceWatcher(storage, poll_interval=5.0)
scheduler = LabInstanceScheduler(storage, reconcile_interval=30.0)

Scaling Considerationsยถ

  • Multiple Watchers: Use resource sharding for scale
  • Controller Parallelism: Process multiple resources concurrently
  • Reconciler Batching: Group operations for efficiency

๐ŸŽฏ Use Casesยถ

ROA is ideal for:

  • Infrastructure Management: Cloud resources, containers, services
  • Workflow Orchestration: Multi-step processes with dependencies
  • Resource Lifecycle: Provisioning, monitoring, cleanup
  • System Integration: Managing external system state
  • DevOps Automation: CI/CD pipelines, deployment management