Skip to content

Exception Hierarchy Enhancement for AWS EC2 Client

Date: November 16, 2025 Status: ✅ Complete

Overview

Replaced generic IntegrationException with specific exception types for better error handling, debugging, and user experience.


Exception Hierarchy

IntegrationException (base)
└── EC2Exception (AWS EC2 base)
    ├── EC2AuthenticationException
    ├── EC2InstanceNotFoundException
    ├── EC2InstanceCreationException
    ├── EC2InstanceOperationException
    ├── EC2TagOperationException
    ├── EC2StatusCheckException
    ├── EC2QuotaExceededException
    └── EC2InvalidParameterException

Exception Types

1. EC2AuthenticationException

When: AWS credentials invalid or insufficient permissions AWS Error Codes: UnauthorizedOperation, InvalidClientTokenId, SignatureDoesNotMatch, AccessDenied Example:

try:
    client.create_instance(...)
except EC2AuthenticationException as e:
    # Show user: "Please check your AWS credentials"
    # Log: Contact AWS admin for permission review

2. EC2InstanceNotFoundException

When: Instance ID doesn't exist AWS Error Codes: InvalidInstanceID.NotFound Example:

try:
    client.start_instance(region, "i-invalid123")
except EC2InstanceNotFoundException:
    # Instance was already terminated or never existed
    # Update UI: Show "Instance not found" message

3. EC2InstanceCreationException

When: Instance creation fails Causes: Invalid AMI, network config issues, internal AWS errors Example:

try:
    dto = client.create_instance(...)
except EC2InstanceCreationException as e:
    # Retry with different AZ or instance type
    # Log for capacity planning

4. EC2InstanceOperationException

When: Start/stop/terminate operations fail Causes: Instance in wrong state, concurrent modification Example:

try:
    client.stop_instance(region, instance_id)
except EC2InstanceOperationException:
    # Can't stop instance that's already stopping
    # Show user current state and wait

5. EC2TagOperationException

When: Tag add/remove/get operations fail Causes: Tag limit exceeded, invalid tag format Example:

try:
    client.add_tags(region, instance_id, large_tag_dict)
except EC2TagOperationException as e:
    # AWS has 50-tag limit per resource
    # Remove old tags before adding new ones

6. EC2StatusCheckException

When: Cannot retrieve instance health status Causes: Instance too new, API throttling Example:

try:
    status = client.get_instance_status_checks(region, instance_id)
except EC2StatusCheckException:
    # Status checks not available yet (instance just launched)
    # Retry after 2-3 minutes

7. EC2QuotaExceededException

When: AWS resource limits hit AWS Error Codes: InstanceLimitExceeded, InsufficientInstanceCapacity, RequestLimitExceeded Example:

try:
    client.create_instance(...)
except EC2QuotaExceededException as e:
    # Request quota increase from AWS
    # Show user: "Maximum instance limit reached"

8. EC2InvalidParameterException

When: Invalid parameters provided AWS Error Codes: InvalidParameterValue, InvalidAMIID.NotFound, InvalidGroup.NotFound Example:

try:
    client.create_instance(
        ami_id="ami-invalid",  # Wrong!
        ...
    )
except EC2InvalidParameterException as e:
    # Validate inputs before calling AWS
    # Show user which parameter is wrong

Benefits

1. Precise Error Handling

# Before (Generic)
try:
    client.start_instance(region, instance_id)
except IntegrationException as e:
    # What happened? Auth? Not found? Already running?
    # Can't tell without parsing error message strings
    log.error(f"Something went wrong: {e}")

# After (Specific)
try:
    client.start_instance(region, instance_id)
except EC2InstanceNotFoundException:
    # Instance was terminated, remove from database
    db.delete_worker(instance_id)
except EC2AuthenticationException:
    # Credentials expired, trigger re-auth flow
    auth_service.refresh_credentials()
except EC2InstanceOperationException as e:
    # Instance in wrong state, wait and retry
    if "already started" in str(e).lower():
        return True  # Success!
    raise

2. Better User Experience

# Application layer can provide specific messages
try:
    await create_worker_command(...)
except EC2QuotaExceededException:
    return {
        "error": "Instance Limit Reached",
        "message": "You've reached your AWS EC2 instance limit. Please contact support to request an increase.",
        "action": "request_quota_increase"
    }
except EC2AuthenticationException:
    return {
        "error": "Authentication Failed",
        "message": "AWS credentials are invalid. Please check your configuration.",
        "action": "verify_credentials"
    }

3. Intelligent Retry Logic

def create_worker_with_retry(config, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.create_instance(...)
        except EC2QuotaExceededException:
            # Don't retry - quota limit won't change
            raise
        except EC2InvalidParameterException:
            # Don't retry - parameters won't magically become valid
            raise
        except EC2InstanceCreationException as e:
            # Might be temporary AWS capacity issue
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
                continue
            raise
        except EC2AuthenticationException:
            # Try to refresh credentials once
            if attempt == 0:
                auth_service.refresh_credentials()
                continue
            raise

4. Better Monitoring & Alerting

# Metrics collection
@track_exceptions
def create_worker(...):
    try:
        return client.create_instance(...)
    except EC2QuotaExceededException:
        metrics.increment("ec2.quota_exceeded")
        alerts.notify_ops("EC2 quota limit reached - request increase")
        raise
    except EC2AuthenticationException:
        metrics.increment("ec2.auth_failures")
        alerts.notify_security("AWS credential issue detected")
        raise
    except EC2InstanceCreationException:
        metrics.increment("ec2.creation_failures")
        # Track for capacity planning
        raise

5. Cleaner Testing

# Unit tests can be very specific
def test_create_worker_quota_exceeded():
    with pytest.raises(EC2QuotaExceededException):
        service.create_worker(...)

def test_create_worker_invalid_ami():
    with pytest.raises(EC2InvalidParameterException) as exc_info:
        service.create_worker(ami_id="ami-invalid")
    assert "AMI" in str(exc_info.value)

def test_start_nonexistent_instance():
    with pytest.raises(EC2InstanceNotFoundException):
        service.start_worker("i-doesnotexist")

Error Parsing Logic

The _parse_aws_error() helper method inspects AWS ClientError and returns appropriate exception:

def _parse_aws_error(self, error: ClientError, operation: str) -> Exception:
    error_code = error.response.get("Error", {}).get("Code", "")

    # Maps AWS error codes to specific exceptions
    if error_code in ("UnauthorizedOperation", ...):
        return EC2AuthenticationException(...)
    if error_code in ("InvalidInstanceID.NotFound", ...):
        return EC2InstanceNotFoundException(...)
    # ... etc

    # Fall back to generic for unknown errors
    return IntegrationException(f"AWS error [{error_code}]: {message}")

Updated Method Signatures

All methods now document specific exceptions:

def create_instance(...) -> CMLWorkerInstanceDto | None:
    """
    Raises:
        EC2InvalidParameterException: Invalid AMI, security group, etc.
        EC2InstanceCreationException: Instance creation failed.
        EC2QuotaExceededException: Instance limit reached.
        EC2AuthenticationException: Invalid credentials.
    """

def start_instance(...) -> bool:
    """
    Raises:
        EC2InstanceNotFoundException: Instance not found.
        EC2InstanceOperationException: Start operation failed.
        EC2AuthenticationException: Invalid credentials.
    """

Migration Guide

For Existing Code

Before:

try:
    client.create_instance(...)
except IntegrationException as e:
    log.error(f"Failed: {e}")
    raise

After (minimal change - still works):

try:
    client.create_instance(...)
except IntegrationException as e:  # Catches all EC2* exceptions too
    log.error(f"Failed: {e}")
    raise

After (better - specific handling):

try:
    client.create_instance(...)
except EC2QuotaExceededException:
    notify_admin("Need quota increase")
    raise
except EC2InvalidParameterException as e:
    log.error(f"Config error: {e}")
    raise
except IntegrationException as e:
    log.error(f"Other AWS error: {e}")
    raise

Best Practices

1. Catch Specific First, Generic Last

try:
    operation()
except EC2InstanceNotFoundException:
    # Handle specific case
    pass
except EC2InstanceOperationException:
    # Handle another specific case
    pass
except IntegrationException:
    # Catch-all for unexpected issues
    pass

2. Don't Catch and Ignore

# BAD
try:
    client.create_instance(...)
except IntegrationException:
    pass  # Silently fails - terrible!

# GOOD
try:
    client.create_instance(...)
except EC2QuotaExceededException as e:
    log.warning(f"Quota exceeded: {e}")
    notify_admin()
    raise  # Re-raise after handling

3. Add Context When Re-raising

try:
    client.start_instance(region, instance_id)
except EC2InstanceNotFoundException as e:
    raise EC2InstanceNotFoundException(
        f"Cannot start worker {worker_name} (instance {instance_id}): {e}"
    ) from e

Summary

8 specific exception types instead of generic IntegrationExceptionAutomatic AWS error parsing with _parse_aws_error() helper ✅ All methods updated with specific exception documentation ✅ Backward compatible - IntegrationException is still base class ✅ Better error handling throughout the codebase ✅ Improved testability with specific exception types

The AWS EC2 client now provides enterprise-grade error handling! 🎯