Testing Strategy¶
| Attribute | Value |
|---|---|
| Document Version | 0.1.0 |
| Status | Draft |
| Created | 2026-01-16 |
| Parent | Implementation Plan |
1. Overview¶
This document defines the comprehensive testing strategy for the Lablet Resource Manager implementation across all phases.
Testing Principles¶
- Test-Driven Development: Write tests before implementation
- Layered Testing: Unit → Integration → E2E pyramid
- Continuous Integration: All tests run on every PR
- Coverage Targets: Minimum 80% line coverage
2. Test Categories¶
2.1 Unit Tests¶
Scope: Individual functions, classes, and methods in isolation
Framework: pytest + pytest-asyncio
Markers: @pytest.mark.unit
Target Coverage:
| Layer | Target |
|---|---|
| Domain entities | 90% |
| Domain value objects | 95% |
| Application commands/queries | 85% |
| Application services | 85% |
| Utility functions | 90% |
Example:
# tests/unit/domain/test_lablet_definition.py
@pytest.mark.unit
class TestLabletDefinition:
def test_create_with_valid_topology(self):
definition = LabletDefinition.create(
name="test-definition",
topology=TopologySpec(format=TopologyFormat.YAML, content="...")
)
assert definition.state.name == "test-definition"
assert definition.state.status == LabletDefinitionStatus.DRAFT
def test_create_rejects_empty_name(self):
with pytest.raises(ValidationError):
LabletDefinition.create(name="", topology=...)
2.2 Integration Tests¶
Scope: Component interactions, database operations, external services
Framework: pytest + testcontainers (MongoDB, etcd, MinIO)
Markers: @pytest.mark.integration
Target Coverage:
| Component | Target |
|---|---|
| Repositories | 80% |
| etcd state store | 85% |
| AWS client (mocked) | 80% |
| CML API client (mocked) | 80% |
Example:
# tests/integration/test_lablet_definition_repository.py
@pytest.mark.integration
class TestLabletDefinitionRepository:
@pytest.fixture
async def repository(self, mongodb_container):
db = get_test_database(mongodb_container)
return MongoLabletDefinitionRepository(db)
async def test_add_and_retrieve(self, repository):
definition = LabletDefinition.create(name="test", topology=...)
await repository.add_async(definition)
retrieved = await repository.get_by_id_async(definition.id())
assert retrieved.state.name == "test"
2.3 API Tests¶
Scope: REST API endpoints, authentication, authorization
Framework: pytest + httpx (TestClient)
Markers: @pytest.mark.api
Target Coverage:
| Endpoint Group | Target |
|---|---|
| Definition CRUD | 85% |
| Instance CRUD | 85% |
| CloudEvents | 80% |
| Internal APIs | 80% |
Example:
# tests/api/test_definitions_controller.py
@pytest.mark.api
class TestDefinitionsController:
async def test_create_definition(self, client, auth_headers):
response = await client.post(
"/api/v1/definitions",
json={"name": "test", "topology": {...}},
headers=auth_headers
)
assert response.status_code == 201
assert response.json()["name"] == "test"
async def test_create_definition_unauthorized(self, client):
response = await client.post("/api/v1/definitions", json={...})
assert response.status_code == 401
2.4 End-to-End Tests¶
Scope: Full workflow scenarios, user journeys
Framework: pytest with Docker Compose test environment
Markers: @pytest.mark.e2e
Target Coverage:
| Workflow | Target |
|---|---|
| Lablet instantiation | 100% |
| Worker provisioning | 100% |
| Auto-scaling | 100% |
| Assessment integration | 100% |
Example:
# tests/e2e/test_lablet_instantiation_workflow.py
@pytest.mark.e2e
class TestLabletInstantiationWorkflow:
async def test_full_instantiation_lifecycle(self, e2e_environment):
# Create definition
definition = await create_definition(...)
# Create scheduled request
instance = await create_instance(
definition_id=definition.id,
timeslot_start=now() + timedelta(hours=1)
)
assert instance.state == "PENDING"
# Wait for scheduling
await wait_for_state(instance.id, "SCHEDULED")
# Wait for instantiation
await wait_for_state(instance.id, "RUNNING")
# Verify lab created on worker
worker = await get_worker(instance.worker_id)
assert instance.lab_id in worker.labs
3. Test Infrastructure¶
3.1 Test Fixtures¶
Shared fixtures in conftest.py:
# tests/conftest.py
@pytest.fixture(scope="session")
async def mongodb_container():
"""Spin up MongoDB container for integration tests."""
with MongoDbContainer() as container:
yield container
@pytest.fixture(scope="session")
async def etcd_container():
"""Spin up etcd container for integration tests."""
with EtcdContainer() as container:
yield container
@pytest.fixture
async def test_database(mongodb_container):
"""Get fresh database for each test."""
client = AsyncIOMotorClient(mongodb_container.get_connection_url())
db = client[f"test_{uuid4().hex[:8]}"]
yield db
await client.drop_database(db.name)
@pytest.fixture
async def etcd_client(etcd_container):
"""Get etcd client for tests."""
return etcd3.client(
host=etcd_container.get_container_host_ip(),
port=etcd_container.get_exposed_port(2379)
)
@pytest.fixture
async def test_app(test_database, etcd_client):
"""Create test application instance."""
app = create_test_app(
database=test_database,
etcd_client=etcd_client
)
yield app
@pytest.fixture
async def client(test_app):
"""HTTP client for API tests."""
async with AsyncClient(app=test_app, base_url="http://test") as client:
yield client
3.2 Mock Services¶
AWS EC2 Client Mock:
# tests/mocks/aws_mock.py
class MockAwsEc2Client:
def __init__(self):
self.instances = {}
async def create_instance_async(self, config):
instance_id = f"i-{uuid4().hex[:8]}"
self.instances[instance_id] = {
"id": instance_id,
"state": "running",
**config
}
return instance_id
async def get_instance_async(self, instance_id):
return self.instances.get(instance_id)
CML API Client Mock:
# tests/mocks/cml_mock.py
class MockCMLApiClient:
def __init__(self):
self.labs = {}
async def create_lab_async(self, worker_url, lab_config):
lab_id = str(uuid4())
self.labs[lab_id] = {
"id": lab_id,
"state": "STOPPED",
**lab_config
}
return lab_id
async def start_lab_async(self, worker_url, lab_id):
self.labs[lab_id]["state"] = "STARTED"
3.3 Test Data Factories¶
# tests/factories.py
from factory import Factory, LazyAttribute, SubFactory
class TopologySpecFactory(Factory):
class Meta:
model = TopologySpec
format = TopologyFormat.YAML
content = LazyAttribute(lambda _: generate_sample_topology())
class LabletDefinitionFactory(Factory):
class Meta:
model = dict # For creating via API
name = LazyAttribute(lambda _: f"definition-{uuid4().hex[:8]}")
topology = SubFactory(TopologySpecFactory)
resource_requirements = {
"cpu_cores": 4,
"memory_gb": 8,
"estimated_nodes": 5
}
4. Test Organization¶
4.1 Directory Structure¶
tests/
├── conftest.py # Global fixtures
├── factories.py # Test data factories
├── mocks/ # Mock services
│ ├── __init__.py
│ ├── aws_mock.py
│ └── cml_mock.py
├── unit/ # Unit tests
│ ├── domain/
│ │ ├── test_lablet_definition.py
│ │ ├── test_lablet_instance.py
│ │ └── test_value_objects.py
│ ├── application/
│ │ ├── commands/
│ │ │ ├── test_create_definition_command.py
│ │ │ └── test_create_instance_command.py
│ │ └── services/
│ │ ├── test_scheduler_service.py
│ │ └── test_port_allocation_service.py
│ └── infrastructure/
│ └── test_etcd_state_store.py
├── integration/ # Integration tests
│ ├── repositories/
│ │ ├── test_lablet_definition_repository.py
│ │ └── test_lablet_instance_repository.py
│ ├── services/
│ │ └── test_etcd_integration.py
│ └── migrations/
│ └── test_database_migrations.py
├── api/ # API tests
│ ├── test_definitions_controller.py
│ ├── test_instances_controller.py
│ ├── test_cloudevents_receiver.py
│ └── test_internal_apis.py
├── e2e/ # End-to-end tests
│ ├── test_instantiation_workflow.py
│ ├── test_scheduling_workflow.py
│ ├── test_autoscaling_workflow.py
│ └── test_assessment_workflow.py
└── performance/ # Performance tests
├── test_scheduler_performance.py
└── test_api_load.py
4.2 Naming Conventions¶
- Test files:
test_<module>.py - Test classes:
Test<ComponentName> - Test methods:
test_<scenario>_<expected_result>
Examples:
test_create_definition_with_valid_topology_succeeds
test_create_definition_with_empty_name_raises_validation_error
test_scheduler_reconcile_assigns_pending_instances
5. Phase-Specific Testing¶
5.1 Phase 1: Foundation¶
Focus Areas:
- Domain entity creation and validation
- Repository CRUD operations
- API endpoint functionality
- Port allocation correctness
Test Counts:
| Category | Tests |
|---|---|
| Unit | ~100 |
| Integration | ~40 |
| API | ~30 |
5.2 Phase 2: Scheduling¶
Focus Areas:
- Scheduler reconciliation loops
- Worker selection algorithms
- State machine transitions
- Leader election behavior
Test Counts:
| Category | Tests |
|---|---|
| Unit | ~80 |
| Integration | ~50 |
| API | ~20 |
5.3 Phase 3: Auto-Scaling¶
Focus Areas:
- Scale-up trigger conditions
- Scale-down with DRAINING
- Resource controller reconciliation
- Concurrent operation handling
Test Counts:
| Category | Tests |
|---|---|
| Unit | ~60 |
| Integration | ~40 |
| E2E | ~20 |
5.4 Phase 4: Assessment¶
Focus Areas:
- CloudEvent processing
- Grading Engine pod generation
- External system integration
- Event correlation
Test Counts:
| Category | Tests |
|---|---|
| Unit | ~50 |
| Integration | ~30 |
| E2E | ~15 |
5.5 Phase 5: Production¶
Focus Areas:
- Performance under load
- Observability correctness
- Security verification
- Full workflow E2E
Test Counts:
| Category | Tests |
|---|---|
| E2E | ~30 |
| Performance | ~10 |
| Security | ~20 |
6. Continuous Integration¶
6.1 CI Pipeline Stages¶
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: poetry install
- name: Run unit tests
run: make test-unit
- name: Upload coverage
uses: codecov/codecov-action@v4
integration-tests:
runs-on: ubuntu-latest
services:
mongodb:
image: mongo:7
ports:
- 27017:27017
etcd:
image: quay.io/coreos/etcd:v3.5.9
ports:
- 2379:2379
steps:
- uses: actions/checkout@v4
- name: Run integration tests
run: make test-integration
api-tests:
runs-on: ubuntu-latest
needs: [unit-tests]
steps:
- uses: actions/checkout@v4
- name: Run API tests
run: make test-api
e2e-tests:
runs-on: ubuntu-latest
needs: [integration-tests, api-tests]
steps:
- uses: actions/checkout@v4
- name: Start services
run: docker compose up -d
- name: Wait for services
run: make wait-for-services
- name: Run E2E tests
run: make test-e2e
6.2 Test Commands¶
# Makefile additions
test-unit:
PYTHONPATH=src pytest tests/unit -v -m unit --cov=src --cov-report=xml
test-integration:
PYTHONPATH=src pytest tests/integration -v -m integration
test-api:
PYTHONPATH=src pytest tests/api -v -m api
test-e2e:
PYTHONPATH=src pytest tests/e2e -v -m e2e --timeout=300
test-all:
PYTHONPATH=src pytest tests -v --cov=src --cov-report=html
test-coverage:
PYTHONPATH=src pytest tests -v --cov=src --cov-report=html --cov-fail-under=80
6.3 Coverage Requirements¶
| Phase | Minimum Coverage |
|---|---|
| Phase 1 | 80% |
| Phase 2 | 82% |
| Phase 3 | 82% |
| Phase 4 | 83% |
| Phase 5 | 85% |
7. Performance Testing¶
7.1 Load Testing¶
Tool: Locust or k6
Scenarios:
# tests/performance/locustfile.py
from locust import HttpUser, task, between
class LabletUser(HttpUser):
wait_time = between(1, 3)
@task(3)
def list_definitions(self):
self.client.get("/api/v1/definitions")
@task(2)
def get_definition(self):
self.client.get("/api/v1/definitions/test-def")
@task(1)
def create_instance(self):
self.client.post("/api/v1/instances", json={...})
Targets:
| Metric | Target |
|---|---|
| API response time (p95) | < 200ms |
| Scheduler reconcile (1000 instances) | < 5s |
| Controller reconcile (100 workers) | < 10s |
7.2 Stress Testing¶
Scenarios:
- 1000 concurrent instance requests
- 100 simultaneous worker provisioning
- Scheduler leader failover under load
7.3 Chaos Testing¶
Tools: Chaos Monkey, Litmus
Scenarios:
- etcd leader failure
- MongoDB connection loss
- Worker instance termination
- Network partition between scheduler and workers
8. Security Testing¶
8.1 Authentication Tests¶
# tests/security/test_authentication.py
class TestAuthentication:
async def test_api_requires_authentication(self, client):
response = await client.get("/api/v1/definitions")
assert response.status_code == 401
async def test_expired_token_rejected(self, client):
expired_token = generate_expired_jwt()
response = await client.get(
"/api/v1/definitions",
headers={"Authorization": f"Bearer {expired_token}"}
)
assert response.status_code == 401
8.2 Authorization Tests¶
# tests/security/test_authorization.py
class TestAuthorization:
async def test_admin_can_create_definition(self, client, admin_token):
response = await client.post(
"/api/v1/definitions",
json={...},
headers={"Authorization": f"Bearer {admin_token}"}
)
assert response.status_code == 201
async def test_viewer_cannot_create_definition(self, client, viewer_token):
response = await client.post(
"/api/v1/definitions",
json={...},
headers={"Authorization": f"Bearer {viewer_token}"}
)
assert response.status_code == 403
8.3 Input Validation Tests¶
# tests/security/test_input_validation.py
class TestInputValidation:
@pytest.mark.parametrize("payload", [
{"name": "<script>alert('xss')</script>"},
{"name": "a" * 10000},
{"topology": {"format": "INVALID"}},
])
async def test_rejects_malicious_input(self, client, auth_headers, payload):
response = await client.post(
"/api/v1/definitions",
json=payload,
headers=auth_headers
)
assert response.status_code in [400, 422]
9. Test Reporting¶
9.1 Coverage Reports¶
- HTML reports generated by pytest-cov
- XML reports uploaded to Codecov
- Badge in README showing current coverage
9.2 Test Results Dashboard¶
- GitHub Actions summary
- Test timing trends
- Flaky test detection
9.3 Performance Reports¶
- Locust HTML reports
- Grafana dashboards for long-running tests
- Regression alerts
10. Revision History¶
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1.0 | 2026-01-16 | Architecture Team | Initial draft |