AWS CLI
AWS CLI Cheatsheet
Audience: DevOps engineers, beginner to senior. Every section moves from fundamentals to production-grade usage. Callouts highlight gotchas, senior tips, cost warnings, and real-world patterns you’ll actually use on-call.
1. Configuration & Authentication
aws configure — initial setup
# Interactive setup — writes to ~/.aws/credentials and ~/.aws/config
aws configure
# Configure a named profile
aws configure --profile staging
# Configure specific keys
aws configure set aws_access_key_id AKIA... --profile prod
aws configure set aws_secret_access_key wJalr... --profile prod
aws configure set region ap-southeast-1 --profile prod
aws configure set output json --profile prod
# List all configured profiles
aws configure list-profiles
# Show current effective configuration
aws configure list
aws configure list --profile prod
Senior tip: Never use the
defaultprofile in scripts. Always name profiles explicitly (--profile prod,--profile staging). This prevents accidental production operations when your shell environment is wrong.
Credential chain — how AWS resolves credentials (in order)
| Priority | Source | Notes |
|---|---|---|
| 1 | AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars | Highest priority; good for CI/CD |
| 2 | AWS_PROFILE env var | Selects a named profile |
| 3 | ~/.aws/credentials file | [default] or [profile-name] |
| 4 | ~/.aws/config file | Can also store credentials |
| 5 | Container credential provider | ECS task role via metadata endpoint |
| 6 | Instance profile (EC2 IMDS) | IAM role attached to EC2 |
| 7 | SSO / web identity token | EKS IRSA, GitHub OIDC |
Gotcha: If
AWS_ACCESS_KEY_IDis set in your shell, it overrides everything — including instance roles and SSO sessions. Alwaysunset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKENbefore using role-based auth.
Named profiles in config
# ~/.aws/config
[profile dev]
region = us-east-1
output = json
[profile prod]
region = ap-southeast-1
output = json
mfa_serial = arn:aws:iam::123456789012:mfa/john.doe
role_arn = arn:aws:iam::987654321098:role/AdminRole
source_profile = dev
# Use named profile per-command
aws s3 ls --profile prod
# Set for entire shell session
export AWS_PROFILE=staging
aws s3 ls # now uses staging
# Per-command region override
aws ec2 describe-instances --region us-west-2
MFA + STS assume-role
# Get temporary credentials with MFA
aws sts get-session-token \
--serial-number arn:aws:iam::123456789012:mfa/john.doe \
--token-code 123456 \
--duration-seconds 43200
# Assume a role (cross-account)
aws sts assume-role \
--role-arn arn:aws:iam::987654321098:role/DeployRole \
--role-session-name deploy-$(date +%s) \
--duration-seconds 3600
# Export credentials from assume-role output
eval $(aws sts assume-role \
--role-arn arn:aws:iam::987654321098:role/DeployRole \
--role-session-name ci-deploy \
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
--output text | awk '{print "export AWS_ACCESS_KEY_ID="$1"\nexport AWS_SECRET_ACCESS_KEY="$2"\nexport AWS_SESSION_TOKEN="$3}')
Senior tip: For CI/CD pipelines, use OIDC federation instead of static keys. GitHub Actions → OIDC → IAM role → no secrets to rotate. Much safer.
SSO login (AWS IAM Identity Center)
# Configure SSO profile
aws configure sso
# Follow prompts: SSO start URL, region, account, role
# Login (opens browser)
aws sso login --profile my-sso-profile
# List accounts and roles
aws sso list-accounts --access-token $(cat ~/.aws/sso/cache/*.json | jq -r '.accessToken')
# Logout
aws sso logout
Verify who you are
# Always check before destructive operations
aws sts get-caller-identity
# Returns: Account, UserId, Arn
aws sts get-caller-identity --query 'Account' --output text
aws sts get-caller-identity --profile prod --query 'Arn' --output text
Output formats
# json (default) — machine-readable
aws ec2 describe-instances --output json
# table — human-readable, great for terminals
aws ec2 describe-instances --output table
# text — tab-delimited, great for grep/awk pipelines
aws s3 ls --output text
# yaml — readable, used in CloudFormation
aws cloudformation describe-stacks --output yaml
2. S3
Basic operations
# List buckets
aws s3 ls
# List bucket contents (with sizes and dates)
aws s3 ls s3://my-bucket/
aws s3 ls s3://my-bucket/prefix/ --recursive --human-readable --summarize
# Create / remove bucket
aws s3 mb s3://my-new-bucket --region ap-southeast-1
aws s3 rb s3://my-empty-bucket
aws s3 rb s3://my-bucket --force # remove with contents
# Copy
aws s3 cp file.txt s3://my-bucket/path/file.txt
aws s3 cp s3://my-bucket/file.txt ./local/file.txt
aws s3 cp s3://source-bucket/file.txt s3://dest-bucket/file.txt # server-side copy
# Move (copy + delete source)
aws s3 mv local.txt s3://my-bucket/remote.txt
aws s3 mv s3://my-bucket/old.txt s3://my-bucket/new.txt
# Delete
aws s3 rm s3://my-bucket/file.txt
aws s3 rm s3://my-bucket/prefix/ --recursive
Sync — the production workhorse
# Sync local → S3 (only uploads changed/new files)
aws s3 sync ./dist s3://my-website-bucket --delete
# Sync S3 → local
aws s3 sync s3://my-bucket/backups ./local-backups
# Exclude/include filters (order matters — evaluated left to right)
aws s3 sync ./app s3://my-bucket \
--exclude "*" \
--include "*.html" \
--include "*.css" \
--include "*.js"
# Sync with storage class and metadata
aws s3 sync ./logs s3://my-archive-bucket/logs \
--storage-class GLACIER \
--sse AES256
# Dry-run equivalent — use --dryrun
aws s3 sync ./dist s3://my-bucket --dryrun
Gotcha:
--deleteremoves files in destination that don’t exist in source. Without it, sync is additive only. Always use--dryrunfirst when syncing to production.
Storage classes
| Class | Use case | Min duration | Retrieval |
|---|---|---|---|
| STANDARD | Frequently accessed | None | Immediate |
| STANDARD_IA | Infrequent access | 30 days | Immediate |
| ONEZONE_IA | Infrequent, single AZ | 30 days | Immediate |
| GLACIER_IR | Archive, rare retrieval | 90 days | Immediate |
| GLACIER | Long-term archive | 90 days | 3–5 hrs |
| DEEP_ARCHIVE | Compliance archive | 180 days | 12–48 hrs |
| INTELLIGENT_TIERING | Unknown access pattern | None | Immediate |
# Upload with specific storage class
aws s3 cp large-backup.tar.gz s3://my-bucket/ --storage-class GLACIER_IR
Cost warning:
STANDARD_IAandGLACIER_IRcharge a per-GB retrieval fee. Don’t use IA tiers for files you access more than once a month — it’s more expensive than STANDARD.
Presigned URLs
# Generate a presigned GET URL (default 3600 seconds)
aws s3 presign s3://my-bucket/private-file.pdf
# Custom expiry (max 7 days = 604800 seconds)
aws s3 presign s3://my-bucket/private-file.pdf --expires-in 86400
# Presigned PUT URL (for direct client uploads)
aws s3api generate-presigned-url \
--bucket my-bucket \
--key uploads/photo.jpg \
--http-method PUT \
--expires-in 3600
S3 API — advanced operations
# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled
# Get specific version of a file
aws s3api get-object \
--bucket my-bucket \
--key file.txt \
--version-id abc123 \
output.txt
# Set bucket policy
aws s3api put-bucket-policy \
--bucket my-bucket \
--policy file://bucket-policy.json
# Enable server-side encryption
aws s3api put-bucket-encryption \
--bucket my-bucket \
--server-side-encryption-configuration '{
"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]
}'
# Set lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration file://lifecycle.json
# Enable transfer acceleration
aws s3api put-bucket-accelerate-configuration \
--bucket my-bucket \
--accelerate-configuration Status=Enabled
# CORS configuration
aws s3api put-bucket-cors \
--bucket my-bucket \
--cors-configuration file://cors.json
# Multipart upload — large file (manual steps)
aws s3api create-multipart-upload --bucket my-bucket --key large-file.iso
# Then upload parts, then complete — s3 cp handles this automatically for >8MB
Senior tip: For files > 100 MB,
aws s3 cpuses multipart automatically. You rarely need to manage multipart manually. Useaws configure set default.s3.multipart_threshold 64MBto tune the threshold.
3. EC2
Describe instances — the foundation
# All instances (verbose)
aws ec2 describe-instances
# Filter by state
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running"
# Filter by tag
aws ec2 describe-instances \
--filters "Name=tag:Environment,Values=prod" \
"Name=tag:Role,Values=web"
# Extract useful fields with --query (JMESPath)
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,Tags[?Key==`Name`].Value|[0]]' \
--output table
# Get instance IDs only
aws ec2 describe-instances \
--filters "Name=tag:Environment,Values=prod" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text
Launch and manage instances
# Launch an instance
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type t3.medium \
--key-name my-keypair \
--security-group-ids sg-0abc123 \
--subnet-id subnet-0def456 \
--iam-instance-profile Name=MyInstanceProfile \
--user-data file://user-data.sh \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-01},{Key=Environment,Value=prod}]' \
--count 1
# Start / stop / terminate
aws ec2 start-instances --instance-ids i-0abc123 i-0def456
aws ec2 stop-instances --instance-ids i-0abc123
aws ec2 terminate-instances --instance-ids i-0abc123
# Reboot
aws ec2 reboot-instances --instance-ids i-0abc123
# Create AMI from running instance
aws ec2 create-image \
--instance-id i-0abc123 \
--name "web-01-$(date +%Y%m%d-%H%M)" \
--description "Production snapshot before deployment" \
--no-reboot
# Wait for instance to be running
aws ec2 wait instance-running --instance-ids i-0abc123
echo "Instance is running"
Security groups
# Create a security group
aws ec2 create-security-group \
--group-name web-sg \
--description "Web server security group" \
--vpc-id vpc-0abc123
# Allow inbound HTTP/HTTPS
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc123 \
--protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc123 \
--protocol tcp --port 443 --cidr 0.0.0.0/0
# Allow SSH from specific IP only
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc123 \
--protocol tcp --port 22 \
--cidr $(curl -s ifconfig.me)/32
# Revoke a rule
aws ec2 revoke-security-group-ingress \
--group-id sg-0abc123 \
--protocol tcp --port 22 --cidr 0.0.0.0/0
# Describe security groups with filter
aws ec2 describe-security-groups \
--filters "Name=group-name,Values=web-sg"
Key pairs, Elastic IPs, VPC
# Create key pair (save private key immediately — AWS won't show it again)
aws ec2 create-key-pair \
--key-name my-keypair \
--query 'KeyMaterial' \
--output text > ~/.ssh/my-keypair.pem
chmod 600 ~/.ssh/my-keypair.pem
# Allocate and associate Elastic IP
aws ec2 allocate-address --domain vpc
aws ec2 associate-address \
--instance-id i-0abc123 \
--allocation-id eipalloc-0abc123
# Describe VPCs and subnets
aws ec2 describe-vpcs
aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=vpc-0abc123" \
--query 'Subnets[*].[SubnetId,CidrBlock,AvailabilityZone,Tags[?Key==`Name`].Value|[0]]' \
--output table
EBS volumes
# Create and attach volume
aws ec2 create-volume \
--size 100 \
--volume-type gp3 \
--availability-zone ap-southeast-1a \
--encrypted \
--throughput 250
aws ec2 attach-volume \
--volume-id vol-0abc123 \
--instance-id i-0abc123 \
--device /dev/sdf
# Detach volume
aws ec2 detach-volume --volume-id vol-0abc123
# Create snapshot
aws ec2 create-snapshot \
--volume-id vol-0abc123 \
--description "Pre-migration backup $(date +%Y%m%d)"
# Describe snapshots owned by me
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[*].[SnapshotId,VolumeSize,StartTime,Description]' \
--output table
IMDSv2 — instance metadata (secure)
# From inside an EC2 instance — IMDSv2 (token-based, required on modern instances)
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-id
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/
Gotcha: IMDSv1 (no token) is disabled by default on new instances. Don’t write scripts that use the old unauthenticated curl pattern — they’ll silently fail on newer instances.
4. IAM
Users, groups, roles
# Create user, group, attach to group
aws iam create-user --user-name jane.doe
aws iam create-group --group-name developers
aws iam add-user-to-group --user-name jane.doe --group-name developers
# Create access key (store output securely — shown once)
aws iam create-access-key --user-name jane.doe
# List users with last login
aws iam list-users \
--query 'Users[*].[UserName,CreateDate,PasswordLastUsed]' \
--output table
# Create a role (trust policy required)
aws iam create-role \
--role-name LambdaExecRole \
--assume-role-policy-document file://trust-policy.json
# Attach managed policy to role
aws iam attach-role-policy \
--role-name LambdaExecRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
# Attach inline policy
aws iam put-role-policy \
--role-name LambdaExecRole \
--policy-name S3ReadPolicy \
--policy-document file://s3-read-policy.json
Policy inspection
# List all customer-managed policies
aws iam list-policies --scope Local \
--query 'Policies[*].[PolicyName,Arn,DefaultVersionId]' \
--output table
# Get policy document (check what version is default first)
aws iam get-policy --policy-arn arn:aws:iam::123456789012:policy/MyPolicy
aws iam get-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/MyPolicy \
--version-id v3
# List policies attached to a role
aws iam list-attached-role-policies --role-name MyRole
aws iam list-role-policies --role-name MyRole # inline policies
# Simulate permissions — critical for debugging access denied
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:role/MyRole \
--action-names s3:GetObject ec2:DescribeInstances \
--resource-arns arn:aws:s3:::my-bucket/*
Senior tip:
simulate-principal-policyis the fastest way to diagnose “Access Denied” errors without making actual API calls. Use it before spending 20 minutes reading JSON policies.
MFA management
# List MFA devices for a user
aws iam list-mfa-devices --user-name jane.doe
# Enable virtual MFA device
aws iam create-virtual-mfa-device \
--virtual-mfa-device-name jane.doe-mfa \
--outfile /tmp/qr.png \
--bootstrap-method QRCodePNG
aws iam enable-mfa-device \
--user-name jane.doe \
--serial-number arn:aws:iam::123456789012:mfa/jane.doe-mfa \
--authentication-code1 123456 \
--authentication-code2 654321
5. ECS & ECR
ECR — container registry
# Login (Docker must be running)
aws ecr get-login-password --region ap-southeast-1 | \
docker login --username AWS --password-stdin \
123456789012.dkr.ecr.ap-southeast-1.amazonaws.com
# Create repository
aws ecr create-repository \
--repository-name my-app \
--image-scanning-configuration scanOnPush=true \
--encryption-configuration encryptionType=AES256
# List images with tags
aws ecr describe-images \
--repository-name my-app \
--query 'imageDetails[*].[imageTags[0],imageSizeInBytes,imagePushedAt]' \
--output table | sort -k3
# Delete untagged images
aws ecr list-images \
--repository-name my-app \
--filter tagStatus=UNTAGGED \
--query 'imageIds[*]' \
--output json | \
xargs -I{} aws ecr batch-delete-image \
--repository-name my-app \
--image-ids {}
# Set lifecycle policy (keep last 10 images)
aws ecr put-lifecycle-policy \
--repository-name my-app \
--lifecycle-policy-text '{"rules":[{"rulePriority":1,"description":"Keep last 10","selection":{"tagStatus":"any","countType":"imageCountMoreThan","countNumber":10},"action":{"type":"expire"}}]}'
ECS — container orchestration
# Create cluster
aws ecs create-cluster \
--cluster-name production \
--capacity-providers FARGATE FARGATE_SPOT \
--default-capacity-provider-strategy \
capacityProvider=FARGATE,weight=1,base=1
# Register task definition
aws ecs register-task-definition \
--cli-input-json file://task-definition.json
# Create service
aws ecs create-service \
--cluster production \
--service-name web-api \
--task-definition web-api:5 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-abc,subnet-def],securityGroups=[sg-xyz],assignPublicIp=DISABLED}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=web,containerPort=3000"
# Deploy new task definition version (force new deployment)
aws ecs update-service \
--cluster production \
--service web-api \
--task-definition web-api:6 \
--force-new-deployment
# Wait for service to stabilize
aws ecs wait services-stable \
--cluster production \
--services web-api
# Describe running tasks
aws ecs describe-tasks \
--cluster production \
--tasks $(aws ecs list-tasks --cluster production --service-name web-api \
--query 'taskArns[0]' --output text)
# ECS Exec — drop into a running container
aws ecs execute-command \
--cluster production \
--task arn:aws:ecs:...:task/abc123 \
--container web \
--interactive \
--command "/bin/sh"
Gotcha: ECS Exec requires
enableExecuteCommand=trueon the service AND the task role must havessmmessages:CreateControlChannelpermissions. Enable at service creation — you can’t add it later without recreating the service.
Senior tip: Use
--force-new-deploymentwith the same task definition to roll an ECS service (e.g., after a config change or to recover from a bad deploy). ECS drains old tasks gracefully.
6. Lambda
Deploy and manage functions
# Create function
aws lambda create-function \
--function-name my-processor \
--runtime python3.12 \
--role arn:aws:iam::123456789012:role/LambdaExecRole \
--handler app.handler \
--zip-file fileb://function.zip \
--memory-size 512 \
--timeout 30 \
--environment Variables='{DB_HOST=prod-db.cluster.local,LOG_LEVEL=INFO}'
# Update function code
aws lambda update-function-code \
--function-name my-processor \
--zip-file fileb://function.zip
# Update from ECR image
aws lambda update-function-code \
--function-name my-processor \
--image-uri 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com/my-app:latest
# Wait for update to complete before invoking
aws lambda wait function-updated --function-name my-processor
# Update configuration
aws lambda update-function-configuration \
--function-name my-processor \
--memory-size 1024 \
--timeout 60 \
--environment Variables='{LOG_LEVEL=DEBUG}'
Invoke
# Synchronous invoke (waits for response)
aws lambda invoke \
--function-name my-processor \
--payload '{"key": "value"}' \
--log-type Tail \
output.json
cat output.json
# Async invoke (fire and forget)
aws lambda invoke \
--function-name my-processor \
--invocation-type Event \
--payload '{"batch_id": "abc123"}' \
/dev/null
# Decode tail logs (base64 encoded)
aws lambda invoke \
--function-name my-processor \
--payload '{}' \
--log-type Tail \
--query 'LogResult' \
--output text \
/dev/null | base64 -d
Versions, aliases, layers
# Publish a version (immutable snapshot of $LATEST)
aws lambda publish-version \
--function-name my-processor \
--description "Release 2.1.0"
# Create/update alias
aws lambda create-alias \
--function-name my-processor \
--name production \
--function-version 5
# Blue/green traffic split (canary deploy)
aws lambda update-alias \
--function-name my-processor \
--name production \
--function-version 6 \
--routing-config AdditionalVersionWeights={"5"=0.1}
# Add permission for another service to invoke
aws lambda add-permission \
--function-name my-processor \
--statement-id s3-invoke \
--action lambda:InvokeFunction \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3:::my-bucket
# Publish layer
aws lambda publish-layer-version \
--layer-name shared-libs \
--zip-file fileb://layer.zip \
--compatible-runtimes python3.11 python3.12
Senior tip: Always use aliases + versions in production. Never invoke
$LATESTfrom other services. Aliases give you instant rollback (just point alias to previous version) and enable canary deployments.
7. CloudFormation
Stack lifecycle
# Validate template before deploying
aws cloudformation validate-template \
--template-body file://template.yaml
# Create stack
aws cloudformation create-stack \
--stack-name my-infra \
--template-body file://template.yaml \
--parameters \
ParameterKey=Environment,ParameterValue=prod \
ParameterKey=InstanceType,ParameterValue=t3.medium \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--tags Key=Project,Value=myapp
# Wait for create to complete
aws cloudformation wait stack-create-complete --stack-name my-infra
# Update stack
aws cloudformation update-stack \
--stack-name my-infra \
--template-body file://template.yaml \
--parameters ParameterKey=InstanceType,ParameterValue=t3.large \
--capabilities CAPABILITY_IAM
# Deploy (create or update, idempotent — preferred for CI/CD)
aws cloudformation deploy \
--template-file template.yaml \
--stack-name my-infra \
--parameter-overrides Environment=prod \
--capabilities CAPABILITY_IAM \
--no-fail-on-empty-changeset
# Delete stack
aws cloudformation delete-stack --stack-name my-infra
aws cloudformation wait stack-delete-complete --stack-name my-infra
Change sets — safe updates
# Create a change set (preview what will change)
aws cloudformation create-change-set \
--stack-name my-infra \
--change-set-name patch-2024-05 \
--template-body file://template.yaml \
--capabilities CAPABILITY_IAM
# Describe the change set (review before applying)
aws cloudformation describe-change-set \
--stack-name my-infra \
--change-set-name patch-2024-05 \
--query 'Changes[*].ResourceChange.[Action,ResourceType,LogicalResourceId,Replacement]' \
--output table
# Execute (apply the change set)
aws cloudformation execute-change-set \
--stack-name my-infra \
--change-set-name patch-2024-05
Troubleshoot stack events
# List recent stack events (great for debugging failures)
aws cloudformation describe-stack-events \
--stack-name my-infra \
--query 'StackEvents[?ResourceStatus==`CREATE_FAILED` || ResourceStatus==`UPDATE_FAILED`].[LogicalResourceId,ResourceStatusReason]' \
--output table
# List all resources in a stack
aws cloudformation list-stack-resources \
--stack-name my-infra \
--query 'StackResourceSummaries[*].[LogicalResourceId,ResourceType,ResourceStatus,PhysicalResourceId]' \
--output table
Senior tip: Use
aws cloudformation deploy --no-fail-on-empty-changesetin CI/CD pipelines. Without it, the pipeline fails if there’s nothing to change — which is a valid state on re-runs.
8. RDS & DynamoDB
RDS
# Describe DB instances
aws rds describe-db-instances \
--query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,DBInstanceStatus,Endpoint.Address]' \
--output table
# Create snapshot
aws rds create-db-snapshot \
--db-instance-identifier prod-postgres \
--db-snapshot-identifier prod-postgres-$(date +%Y%m%d)
# Wait for snapshot to be available
aws rds wait db-snapshot-completed \
--db-snapshot-identifier prod-postgres-20240501
# Restore from snapshot (creates new instance)
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier prod-postgres-restored \
--db-snapshot-identifier prod-postgres-20240501 \
--db-instance-class db.t3.medium \
--no-multi-az
# Modify instance (e.g., upgrade class) — apply immediately or next maintenance window
aws rds modify-db-instance \
--db-instance-identifier prod-postgres \
--db-instance-class db.r6g.large \
--apply-immediately
Cost warning:
--apply-immediatelyon an RDS modify causes a brief downtime. Without it, changes apply during the maintenance window. In production, prefer the maintenance window unless urgent.
DynamoDB
# Create table
aws dynamodb create-table \
--table-name users \
--attribute-definitions \
AttributeName=userId,AttributeType=S \
AttributeName=createdAt,AttributeType=N \
--key-schema \
AttributeName=userId,KeyType=HASH \
AttributeName=createdAt,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST
# Put item
aws dynamodb put-item \
--table-name users \
--item '{"userId":{"S":"u123"},"createdAt":{"N":"1714521600"},"name":{"S":"Alice"}}'
# Get item
aws dynamodb get-item \
--table-name users \
--key '{"userId":{"S":"u123"},"createdAt":{"N":"1714521600"}}'
# Query by partition key (efficient — uses index)
aws dynamodb query \
--table-name users \
--key-condition-expression "userId = :uid" \
--expression-attribute-values '{":uid":{"S":"u123"}}'
# Scan with filter (expensive — full table read)
aws dynamodb scan \
--table-name users \
--filter-expression "contains(#n, :name)" \
--expression-attribute-names '{"#n":"name"}' \
--expression-attribute-values '{":name":{"S":"Ali"}}'
# Batch write (up to 25 items)
aws dynamodb batch-write-item \
--request-items file://batch-items.json
# PartiQL — SQL-like syntax
aws dynamodb execute-statement \
--statement "SELECT * FROM users WHERE userId = 'u123'"
# Update table capacity mode
aws dynamodb update-table \
--table-name users \
--billing-mode PROVISIONED \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=50
Gotcha:
scanreads every item in the table — it will consume all your read capacity units and cost you money on large tables. Always usequerywith an index for production access patterns.
9. CloudWatch & Logs
Log groups and streams
# List log groups
aws logs describe-log-groups \
--query 'logGroups[*].[logGroupName,retentionInDays,storedBytes]' \
--output table
# Create log group with retention
aws logs create-log-group --log-group-name /app/production
aws logs put-retention-policy \
--log-group-name /app/production \
--retention-in-days 30
# Filter log events (search within a time range)
aws logs filter-log-events \
--log-group-name /aws/lambda/my-processor \
--filter-pattern "ERROR" \
--start-time $(date -d '1 hour ago' +%s000) \
--limit 50
# Tail logs in real-time (requires AWS CLI v2)
aws logs tail /aws/ecs/production --follow --format short
# Get a specific stream
aws logs get-log-events \
--log-group-name /app/production \
--log-stream-name "web/app/abc123" \
--start-from-head
Metrics and alarms
# Put custom metric data
aws cloudwatch put-metric-data \
--namespace "MyApp" \
--metric-name "OrdersProcessed" \
--value 42 \
--unit Count \
--dimensions Environment=prod,Service=order-worker
# Get metric statistics
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0abc123 \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
# Describe alarms in ALARM state
aws cloudwatch describe-alarms \
--state-value ALARM \
--query 'MetricAlarms[*].[AlarmName,StateReason,MetricName]' \
--output table
CloudWatch Logs Insights
# Run an Insights query (returns a query ID)
QUERY_ID=$(aws logs start-query \
--log-group-name /aws/lambda/my-processor \
--start-time $(date -d '1 hour ago' +%s) \
--end-time $(date +%s) \
--query-string 'fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20' \
--query 'queryId' --output text)
# Retrieve results
aws logs get-query-results --query-id "$QUERY_ID"
Senior tip:
aws logs tail --followis your best friend for live debugging ECS/Lambda issues. Combine with--filter-patternto reduce noise:aws logs tail /aws/ecs/prod --follow --filter-pattern "ERROR".
10. Route 53
DNS management
# List hosted zones
aws route53 list-hosted-zones \
--query 'HostedZones[*].[Name,Id,Config.PrivateZone]' \
--output table
# List records in a zone
aws route53 list-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--query 'ResourceRecordSets[*].[Name,Type,TTL,ResourceRecords[0].Value]' \
--output table
# Upsert a DNS record (create or update)
aws route53 change-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"TTL": 60,
"ResourceRecords": [{"Value": "1.2.3.4"}]
}
}]
}'
# Delete a record (must match existing exactly)
aws route53 change-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--change-batch '{
"Changes": [{
"Action": "DELETE",
"ResourceRecordSet": {
"Name": "old.example.com",
"Type": "CNAME",
"TTL": 300,
"ResourceRecords": [{"Value": "lb-old.ap-southeast-1.elb.amazonaws.com"}]
}
}]
}'
# Create health check
aws route53 create-health-check \
--caller-reference $(date +%s) \
--health-check-config '{
"Type": "HTTPS",
"FullyQualifiedDomainName": "api.example.com",
"Port": 443,
"ResourcePath": "/health",
"RequestInterval": 30,
"FailureThreshold": 3
}'
Gotcha: Route 53
DELETErequires you to pass the exact current record values. If they don’t match, you get an error. Alwayslist-resource-record-setsfirst to confirm current state before deleting.
11. Secrets Manager & SSM Parameter Store
Secrets Manager
# Create a secret
aws secretsmanager create-secret \
--name prod/myapp/db-credentials \
--secret-string '{"username":"admin","password":"s3cr3t"}' \
--description "Production DB credentials"
# Get secret value
aws secretsmanager get-secret-value \
--secret-id prod/myapp/db-credentials \
--query 'SecretString' \
--output text | jq .
# Get specific version
aws secretsmanager get-secret-value \
--secret-id prod/myapp/db-credentials \
--version-id abc123
# Update secret (rotates AWSCURRENT to AWSPREVIOUS)
aws secretsmanager update-secret \
--secret-id prod/myapp/db-credentials \
--secret-string '{"username":"admin","password":"n3wP@ss"}'
# List secrets
aws secretsmanager list-secrets \
--query 'SecretList[*].[Name,LastChangedDate,RotationEnabled]' \
--output table
SSM Parameter Store
# Create parameters
aws ssm put-parameter \
--name /prod/myapp/db-host \
--value "prod-db.cluster.local" \
--type String
aws ssm put-parameter \
--name /prod/myapp/api-key \
--value "supersecretkey" \
--type SecureString \
--key-id alias/aws/ssm
# Get a single parameter (decrypt SecureString)
aws ssm get-parameter \
--name /prod/myapp/api-key \
--with-decryption \
--query 'Parameter.Value' \
--output text
# Get multiple parameters at once
aws ssm get-parameters \
--names /prod/myapp/db-host /prod/myapp/api-key \
--with-decryption
# Get all parameters under a path (paginated)
aws ssm get-parameters-by-path \
--path /prod/myapp \
--with-decryption \
--recursive \
--query 'Parameters[*].[Name,Value]' \
--output table
# Export all params under a path as env vars
aws ssm get-parameters-by-path \
--path /prod/myapp --with-decryption \
--query 'Parameters[*].[Name,Value]' \
--output text | \
awk '{n=split($1,a,"/"); print toupper(a[n])"="$2}' > .env
SSM Session Manager — no bastion required
# Start interactive session (no SSH key needed, all traffic logged)
aws ssm start-session --target i-0abc123
# Port forwarding (tunnel RDS through SSM)
aws ssm start-session \
--target i-0abc123 \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters '{"host":["prod-db.cluster.local"],"portNumber":["5432"],"localPortNumber":["5432"]}'
# Run a command on multiple instances
aws ssm send-command \
--document-name AWS-RunShellScript \
--targets Key=tag:Environment,Values=prod \
--parameters 'commands=["systemctl restart nginx"]' \
--output-s3-bucket-name my-ssm-logs
Senior tip: Replace all SSH bastion access with SSM Session Manager. Every session is logged to CloudTrail and optionally to S3/CloudWatch Logs. Zero open inbound ports = massive attack surface reduction.
Gotcha: SSM Parameter Store limits free tier to 10,000 standard parameters. Use Secrets Manager for credentials (supports automatic rotation). Use Parameter Store for non-sensitive config. The pricing differs significantly at scale.
12. SQS & SNS
SQS — queues
# Create queue (standard or FIFO)
aws sqs create-queue --queue-name my-jobs
aws sqs create-queue \
--queue-name my-ordered-jobs.fifo \
--attributes FifoQueue=true,ContentBasedDeduplication=true
# Get queue URL (needed for most operations)
QUEUE_URL=$(aws sqs get-queue-url \
--queue-name my-jobs \
--query 'QueueUrl' --output text)
# Send message
aws sqs send-message \
--queue-url "$QUEUE_URL" \
--message-body '{"job":"process-order","orderId":"ord-123"}' \
--delay-seconds 5
# Receive messages (returns up to 10)
aws sqs receive-message \
--queue-url "$QUEUE_URL" \
--max-number-of-messages 10 \
--wait-time-seconds 20 \
--visibility-timeout 30
# Delete message after processing
aws sqs delete-message \
--queue-url "$QUEUE_URL" \
--receipt-handle "AQEBw..."
# Purge queue (delete all messages)
aws sqs purge-queue --queue-url "$QUEUE_URL"
# Get queue attributes (depth, age, etc.)
aws sqs get-queue-attributes \
--queue-url "$QUEUE_URL" \
--attribute-names All \
--query 'Attributes.{Depth:ApproximateNumberOfMessages,InFlight:ApproximateNumberOfMessagesNotVisible,Age:ApproximateAgeOfOldestMessage}'
Gotcha: Long polling (
--wait-time-seconds 20) is critical in production. Without it (short polling), you pay for empty receives and miss messages. Always use 20 seconds in consumers.
SNS — pub/sub
# Create topic
TOPIC_ARN=$(aws sns create-topic \
--name my-alerts \
--query 'TopicArn' --output text)
# Subscribe endpoints
aws sns subscribe \
--topic-arn "$TOPIC_ARN" \
--protocol email \
--notification-endpoint ops-team@example.com
aws sns subscribe \
--topic-arn "$TOPIC_ARN" \
--protocol sqs \
--notification-endpoint arn:aws:sqs:ap-southeast-1:123456789012:my-jobs
# Publish a notification
aws sns publish \
--topic-arn "$TOPIC_ARN" \
--subject "Deploy: prod v2.1.0 succeeded" \
--message "Deployment completed at $(date). No errors."
# Publish with message attributes (for filtering)
aws sns publish \
--topic-arn "$TOPIC_ARN" \
--message "Order placed" \
--message-attributes '{"event":{"DataType":"String","StringValue":"order.created"}}'
# List subscriptions for a topic
aws sns list-subscriptions-by-topic --topic-arn "$TOPIC_ARN"
13. Advanced Patterns
–query with JMESPath — filter and transform output
# Extract nested value
aws ec2 describe-instances \
--query 'Reservations[0].Instances[0].PrivateIpAddress'
# Filter array, project fields
aws ec2 describe-instances \
--query 'Reservations[*].Instances[?State.Name==`running`].[InstanceId,InstanceType,PrivateIpAddress]'
# Use a tag value (complex path)
aws ec2 describe-instances \
--query 'Reservations[*].Instances[*].{ID:InstanceId,Name:Tags[?Key==`Name`].Value|[0],IP:PrivateIpAddress}'
# Sort by field (JMESPath sort_by)
aws ec2 describe-snapshots --owner-ids self \
--query 'sort_by(Snapshots, &StartTime)[-5:].[SnapshotId,StartTime,VolumeSize]' \
--output table
# Count results
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" \
--query 'length(Reservations[*].Instances[])'
–output text + shell pipelines
# Restart all instances with a tag
aws ec2 describe-instances \
--filters "Name=tag:Role,Values=worker" "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text | \
xargs aws ec2 reboot-instances --instance-ids
# Bulk delete ECR images
aws ecr list-images \
--repository-name my-app \
--filter tagStatus=UNTAGGED \
--query 'imageIds[*]' \
--output json | \
xargs -I imageIds aws ecr batch-delete-image \
--repository-name my-app \
--image-ids imageIds
Waiters — synchronize on resource state
# Waiters poll every N seconds until condition is met or timeout (25 retries default)
aws ec2 wait instance-running --instance-ids i-0abc123
aws ec2 wait instance-stopped --instance-ids i-0abc123
aws ec2 wait image-available --image-ids ami-0abc123
aws s3api wait bucket-exists --bucket my-bucket
aws cloudformation wait stack-create-complete --stack-name my-stack
aws rds wait db-instance-available --db-instance-identifier prod-db
aws lambda wait function-updated --function-name my-function
aws ecs wait services-stable --cluster prod --services web-api
Senior tip: Waiters exit with code 255 on timeout. In bash scripts, always check the exit code:
aws ec2 wait instance-running ... || { echo "Timeout waiting for instance"; exit 1; }.
Pagination — handle large result sets
# CLI auto-paginates by default for most commands
# To limit and paginate manually:
aws s3api list-objects-v2 \
--bucket my-large-bucket \
--max-items 100 \
--starting-token "NEXT_TOKEN_FROM_PREVIOUS"
# Auto-paginate with a loop (using --no-paginate disables auto-paging)
aws ec2 describe-instances \
--no-paginate \
--max-results 50
# Get all pages with a shell loop
NEXT_TOKEN=""
while true; do
RESULT=$(aws s3api list-objects-v2 \
--bucket my-bucket \
--max-keys 1000 \
${NEXT_TOKEN:+--starting-token "$NEXT_TOKEN"})
echo "$RESULT" | jq '.Contents[].Key'
NEXT_TOKEN=$(echo "$RESULT" | jq -r '.NextContinuationToken // empty')
[[ -z "$NEXT_TOKEN" ]] && break
done
Dry-run — test permissions before running
# EC2 supports --dry-run natively — returns DryRunOperation if you have permission
aws ec2 run-instances --dry-run \
--image-id ami-0abc123 \
--instance-type t3.micro \
--count 1
# Expected output on success:
# An error occurred (DryRunOperation) when calling the RunInstances operation:
# Request would have succeeded, but DryRun flag is set.
Tag-based filtering across services
# Filter by multiple tags
aws ec2 describe-instances \
--filters \
"Name=tag:Environment,Values=prod" \
"Name=tag:Team,Values=platform" \
"Name=instance-state-name,Values=running"
# Tag Resources API — bulk tag across resource types
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=Environment,Values=prod \
--resource-type-filters ec2:instance rds:db \
--query 'ResourceTagMappingList[*].ResourceARN' \
--output text
# Bulk tag multiple resources
aws ec2 create-tags \
--resources i-0abc123 vol-0def456 sg-0ghi789 \
--tags Key=Environment,Value=prod Key=CostCenter,Value=engineering
Useful global flags reference
| Flag | Purpose | Example |
|---|---|---|
--profile NAME | Use named credential profile | --profile prod |
--region REGION | Override configured region | --region us-east-1 |
--output FORMAT | json, text, table, yaml | --output table |
--query EXPR | JMESPath filter on output | --query 'Users[0].Arn' |
--no-paginate | Disable auto-pagination | for large result sets |
--dry-run | Test permission without acting | EC2 operations |
--debug | Full HTTP request/response log | Diagnosing TLS/auth issues |
--no-verify-ssl | Skip SSL verification | Never use in production |
--endpoint-url URL | Override service endpoint | LocalStack, custom VPC endpoints |
--cli-input-json | Read all params from file | Complex commands |
--generate-cli-skeleton | Print JSON template for input | Scaffold --cli-input-json files |
# Generate skeleton for any command — great for complex resource creation
aws ec2 run-instances --generate-cli-skeleton > run-instances.json
# Edit the JSON, then:
aws ec2 run-instances --cli-input-json file://run-instances.json
Senior tip: Set
AWS_PAGER=""to disable the automaticlesspager for scripting:export AWS_PAGER="". Add to your.zshrcor.bashrcfor permanent effect. The pager blocks scripts that pipe CLI output.
Quick Reference: Common Patterns
# Who am I, what account, what region?
aws sts get-caller-identity && aws configure get region
# Find instances by private IP
aws ec2 describe-instances \
--filters "Name=private-ip-address,Values=10.0.1.42" \
--query 'Reservations[0].Instances[0].[InstanceId,Tags[?Key==`Name`].Value|[0]]'
# List all IAM roles that can be assumed by Lambda
aws iam list-roles \
--query 'Roles[?contains(AssumeRolePolicyDocument.Statement[0].Principal.Service,`lambda.amazonaws.com`)].RoleName'
# Find all public S3 buckets (ACL-based)
for bucket in $(aws s3 ls | awk '{print $3}'); do
acl=$(aws s3api get-bucket-acl --bucket $bucket --query 'Grants[?Grantee.URI==`http://acs.amazonaws.com/groups/global/AllUsers`]' --output text)
[[ -n "$acl" ]] && echo "PUBLIC: $bucket"
done
# Cost-by-service last month (requires Cost Explorer enabled)
aws ce get-cost-and-usage \
--time-period Start=2024-04-01,End=2024-05-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[0].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \
--output table | sort -k2 -rn