AWS IAM Setup for CML Worker Management¶
This document describes how to provision AWS credentials with the appropriate IAM permissions for the CML Cloud Manager to manage CML Worker EC2 instances.
Overview¶
The CML Cloud Manager uses AWS SDK (boto3) to interact with AWS services for managing CML Worker instances. The application requires programmatic access credentials (Access Key ID and Secret Access Key) with specific IAM permissions.
Required AWS Services¶
The application interacts with the following AWS services:
- Amazon EC2 - Create, start, stop, terminate, and monitor EC2 instances
- Amazon CloudWatch - Retrieve CPU and memory utilization metrics
IAM User Setup¶
Step 1: Create IAM User¶
- Sign in to the AWS IAM Console
- Navigate to Users → Add users
- Enter a username (e.g.,
cml-cloud-manager-service) - Select Access key - Programmatic access (not console access)
- Click Next: Permissions
Step 2: Create Custom IAM Policy¶
Create a custom policy with the minimum required permissions for CML Worker management.
Policy Name: CMLWorkerManagementPolicy
Policy JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EC2InstanceManagement",
"Effect": "Allow",
"Action": [
"ec2:RunInstances",
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:TerminateInstances",
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:DescribeImages",
"ec2:DescribeRegions"
],
"Resource": "*"
},
{
"Sid": "EC2TagManagement",
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DeleteTags",
"ec2:DescribeTags"
],
"Resource": "*"
},
{
"Sid": "CloudWatchMetricsRead",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricStatistics"
],
"Resource": "*"
}
]
}
Step 3: Attach Policy to User¶
- In the IAM user creation wizard, select Attach policies directly
- Click Create policy (opens in new tab)
- Paste the JSON policy above
- Review and create the policy
- Return to the user creation wizard and attach the newly created policy
- Complete the user creation
Step 4: Generate Access Keys¶
- After creating the user, navigate to the Security credentials tab
- Click Create access key
- Select Application running outside AWS
- Optionally add a description tag (e.g., "CML Cloud Manager Production")
- Download the credentials - you'll only see the secret key once!
Required IAM Permissions (Detailed)¶
EC2 Permissions¶
| Permission | Purpose | Used In |
|---|---|---|
ec2:RunInstances |
Create new EC2 instances for CML Workers | create_instance() |
ec2:StartInstances |
Start stopped CML Worker instances | start_instance() |
ec2:StopInstances |
Stop running CML Worker instances | stop_instance() |
ec2:TerminateInstances |
Permanently terminate CML Worker instances | terminate_instance() |
ec2:DescribeInstances |
Query instance details, state, and metadata | get_instance_details(), get_all_instances() |
ec2:DescribeInstanceStatus |
Check instance status checks (system/instance) | get_instance_status() |
ec2:DescribeImages |
Query AMI details and search AMIs by name | get_ami_ids_by_name() |
ec2:DescribeRegions |
List available AWS regions (health check) | health() |
EC2 Tag Permissions¶
| Permission | Purpose | Used In |
|---|---|---|
ec2:CreateTags |
Add or update tags on instances | update_instance_tags() |
ec2:DeleteTags |
Remove tags from instances | delete_instance_tags() |
ec2:DescribeTags |
Query existing tags on instances | get_instance_tags() |
CloudWatch Permissions¶
| Permission | Purpose | Used In |
|---|---|---|
cloudwatch:GetMetricStatistics |
Retrieve CPU and memory utilization metrics | get_instance_resources_utilization() |
Resource Restrictions (Optional Security Hardening)¶
For production environments, consider restricting permissions to specific resources:
Restrict to Specific Regions¶
{
"Sid": "EC2InstanceManagementRegionRestricted",
"Effect": "Allow",
"Action": [
"ec2:RunInstances",
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:TerminateInstances",
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:DescribeImages"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"ec2:Region": [
"us-east-1",
"us-west-2"
]
}
}
}
Restrict to Tagged Instances¶
Require all CML Worker instances to have a specific tag:
{
"Sid": "EC2InstanceManagementTagRestricted",
"Effect": "Allow",
"Action": [
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:TerminateInstances"
],
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/ManagedBy": "cml-cloud-manager"
}
}
}
Restrict RunInstances Parameters¶
Limit instance types and AMIs that can be launched:
{
"Sid": "EC2RunInstancesRestricted",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringLike": {
"ec2:InstanceType": [
"c5.*",
"c6i.*",
"m5.*"
]
}
}
}
Environment Configuration¶
After generating the AWS access keys, configure them in your environment:
Docker Compose / .env File¶
# ============================================================================
# AWS Settings
# ============================================================================
AWS_ACCESS_KEY_ID=AKIA6G....
AWS_SECRET_ACCESS_KEY=4mQiit....
Kubernetes Secret¶
kubectl create secret generic aws-credentials \
--from-literal=AWS_ACCESS_KEY_ID='AKIA...' \
--from-literal=AWS_SECRET_ACCESS_KEY='...' \
--namespace=cml-cloud-manager
Environment Variables (Direct)¶
Security Best Practices¶
1. Use Least Privilege¶
- Only grant the minimum permissions required
- Restrict by region, instance tags, or instance types when possible
- Regularly audit and remove unused permissions
2. Credential Rotation¶
- Rotate access keys every 90 days
- Use AWS Secrets Manager or similar for automatic rotation
- Monitor for unused credentials
3. Enable CloudTrail¶
Enable AWS CloudTrail to audit all API calls made by the service account:
4. Use IAM Roles (When Possible)¶
If running on AWS infrastructure (EC2, ECS, EKS), use IAM roles instead of access keys:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Then attach the CMLWorkerManagementPolicy to the role.
5. Monitor Credential Usage¶
Set up CloudWatch alarms for unusual API activity:
- High number of
RunInstancescalls TerminateInstancescalls outside business hours- Failed authentication attempts
Testing Credentials¶
After configuring credentials, test them using the health check endpoint:
Check the application logs for AWS connectivity:
Expected output:
If credentials are invalid:
WARNING ⚠️ AWS EC2 health check failed: Error while verifying access to EC2: An error occurred (AuthFailure)
WARNING ⚠️ AWS operations may fail at runtime
Troubleshooting¶
Issue: "AuthFailure" Error¶
Cause: Invalid or expired credentials
Solution:
- Verify credentials are correctly set in
.env - Check if the IAM user is active (not deleted)
- Verify access keys haven't been rotated or disabled
- Test credentials with AWS CLI:
Issue: "UnauthorizedOperation" Error¶
Cause: Missing IAM permissions
Solution:
- Review the IAM policy attached to the user
- Ensure all required permissions from the policy above are included
- Check for explicit Deny statements in other policies
- Verify resource restrictions aren't blocking operations
Issue: CloudWatch Metrics Return "Unknown"¶
Cause: Missing CloudWatch permissions or metrics not enabled
Solution:
- Verify
cloudwatch:GetMetricStatisticspermission is granted - Enable detailed monitoring on EC2 instances
- Install CloudWatch agent on instances for memory metrics
- Check CloudWatch service availability in the region
Additional Resources¶
Policy Version History¶
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-11-16 | Initial policy with EC2 and CloudWatch permissions |
Contact¶
For questions about AWS IAM setup or permissions issues, contact the infrastructure team or refer to the application logs for detailed error messages.