← CLI Cheatsheets

AWS CLI

AWS CLI Cheatsheet

Audience: DevOps engineers, beginner to senior. Every section moves from fundamentals to production-grade usage. Callouts highlight gotchas, senior tips, cost warnings, and real-world patterns you’ll actually use on-call.


1. Configuration & Authentication

aws configure — initial setup

# Interactive setup — writes to ~/.aws/credentials and ~/.aws/config
aws configure

# Configure a named profile
aws configure --profile staging

# Configure specific keys
aws configure set aws_access_key_id AKIA... --profile prod
aws configure set aws_secret_access_key wJalr... --profile prod
aws configure set region ap-southeast-1 --profile prod
aws configure set output json --profile prod

# List all configured profiles
aws configure list-profiles

# Show current effective configuration
aws configure list
aws configure list --profile prod

Senior tip: Never use the default profile in scripts. Always name profiles explicitly (--profile prod, --profile staging). This prevents accidental production operations when your shell environment is wrong.

Credential chain — how AWS resolves credentials (in order)

PrioritySourceNotes
1AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env varsHighest priority; good for CI/CD
2AWS_PROFILE env varSelects a named profile
3~/.aws/credentials file[default] or [profile-name]
4~/.aws/config fileCan also store credentials
5Container credential providerECS task role via metadata endpoint
6Instance profile (EC2 IMDS)IAM role attached to EC2
7SSO / web identity tokenEKS IRSA, GitHub OIDC

Gotcha: If AWS_ACCESS_KEY_ID is set in your shell, it overrides everything — including instance roles and SSO sessions. Always unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN before using role-based auth.

Named profiles in config

# ~/.aws/config
[profile dev]
region = us-east-1
output = json

[profile prod]
region = ap-southeast-1
output = json
mfa_serial = arn:aws:iam::123456789012:mfa/john.doe
role_arn = arn:aws:iam::987654321098:role/AdminRole
source_profile = dev
# Use named profile per-command
aws s3 ls --profile prod

# Set for entire shell session
export AWS_PROFILE=staging
aws s3 ls   # now uses staging

# Per-command region override
aws ec2 describe-instances --region us-west-2

MFA + STS assume-role

# Get temporary credentials with MFA
aws sts get-session-token \
  --serial-number arn:aws:iam::123456789012:mfa/john.doe \
  --token-code 123456 \
  --duration-seconds 43200

# Assume a role (cross-account)
aws sts assume-role \
  --role-arn arn:aws:iam::987654321098:role/DeployRole \
  --role-session-name deploy-$(date +%s) \
  --duration-seconds 3600

# Export credentials from assume-role output
eval $(aws sts assume-role \
  --role-arn arn:aws:iam::987654321098:role/DeployRole \
  --role-session-name ci-deploy \
  --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
  --output text | awk '{print "export AWS_ACCESS_KEY_ID="$1"\nexport AWS_SECRET_ACCESS_KEY="$2"\nexport AWS_SESSION_TOKEN="$3}')

Senior tip: For CI/CD pipelines, use OIDC federation instead of static keys. GitHub Actions → OIDC → IAM role → no secrets to rotate. Much safer.

SSO login (AWS IAM Identity Center)

# Configure SSO profile
aws configure sso
# Follow prompts: SSO start URL, region, account, role

# Login (opens browser)
aws sso login --profile my-sso-profile

# List accounts and roles
aws sso list-accounts --access-token $(cat ~/.aws/sso/cache/*.json | jq -r '.accessToken')

# Logout
aws sso logout

Verify who you are

# Always check before destructive operations
aws sts get-caller-identity
# Returns: Account, UserId, Arn

aws sts get-caller-identity --query 'Account' --output text
aws sts get-caller-identity --profile prod --query 'Arn' --output text

Output formats

# json (default) — machine-readable
aws ec2 describe-instances --output json

# table — human-readable, great for terminals
aws ec2 describe-instances --output table

# text — tab-delimited, great for grep/awk pipelines
aws s3 ls --output text

# yaml — readable, used in CloudFormation
aws cloudformation describe-stacks --output yaml

2. S3

Basic operations

# List buckets
aws s3 ls

# List bucket contents (with sizes and dates)
aws s3 ls s3://my-bucket/
aws s3 ls s3://my-bucket/prefix/ --recursive --human-readable --summarize

# Create / remove bucket
aws s3 mb s3://my-new-bucket --region ap-southeast-1
aws s3 rb s3://my-empty-bucket
aws s3 rb s3://my-bucket --force   # remove with contents

# Copy
aws s3 cp file.txt s3://my-bucket/path/file.txt
aws s3 cp s3://my-bucket/file.txt ./local/file.txt
aws s3 cp s3://source-bucket/file.txt s3://dest-bucket/file.txt   # server-side copy

# Move (copy + delete source)
aws s3 mv local.txt s3://my-bucket/remote.txt
aws s3 mv s3://my-bucket/old.txt s3://my-bucket/new.txt

# Delete
aws s3 rm s3://my-bucket/file.txt
aws s3 rm s3://my-bucket/prefix/ --recursive

Sync — the production workhorse

# Sync local → S3 (only uploads changed/new files)
aws s3 sync ./dist s3://my-website-bucket --delete

# Sync S3 → local
aws s3 sync s3://my-bucket/backups ./local-backups

# Exclude/include filters (order matters — evaluated left to right)
aws s3 sync ./app s3://my-bucket \
  --exclude "*" \
  --include "*.html" \
  --include "*.css" \
  --include "*.js"

# Sync with storage class and metadata
aws s3 sync ./logs s3://my-archive-bucket/logs \
  --storage-class GLACIER \
  --sse AES256

# Dry-run equivalent — use --dryrun
aws s3 sync ./dist s3://my-bucket --dryrun

Gotcha: --delete removes files in destination that don’t exist in source. Without it, sync is additive only. Always use --dryrun first when syncing to production.

Storage classes

ClassUse caseMin durationRetrieval
STANDARDFrequently accessedNoneImmediate
STANDARD_IAInfrequent access30 daysImmediate
ONEZONE_IAInfrequent, single AZ30 daysImmediate
GLACIER_IRArchive, rare retrieval90 daysImmediate
GLACIERLong-term archive90 days3–5 hrs
DEEP_ARCHIVECompliance archive180 days12–48 hrs
INTELLIGENT_TIERINGUnknown access patternNoneImmediate
# Upload with specific storage class
aws s3 cp large-backup.tar.gz s3://my-bucket/ --storage-class GLACIER_IR

Cost warning: STANDARD_IA and GLACIER_IR charge a per-GB retrieval fee. Don’t use IA tiers for files you access more than once a month — it’s more expensive than STANDARD.

Presigned URLs

# Generate a presigned GET URL (default 3600 seconds)
aws s3 presign s3://my-bucket/private-file.pdf

# Custom expiry (max 7 days = 604800 seconds)
aws s3 presign s3://my-bucket/private-file.pdf --expires-in 86400

# Presigned PUT URL (for direct client uploads)
aws s3api generate-presigned-url \
  --bucket my-bucket \
  --key uploads/photo.jpg \
  --http-method PUT \
  --expires-in 3600

S3 API — advanced operations

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

# Get specific version of a file
aws s3api get-object \
  --bucket my-bucket \
  --key file.txt \
  --version-id abc123 \
  output.txt

# Set bucket policy
aws s3api put-bucket-policy \
  --bucket my-bucket \
  --policy file://bucket-policy.json

# Enable server-side encryption
aws s3api put-bucket-encryption \
  --bucket my-bucket \
  --server-side-encryption-configuration '{
    "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]
  }'

# Set lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

# Enable transfer acceleration
aws s3api put-bucket-accelerate-configuration \
  --bucket my-bucket \
  --accelerate-configuration Status=Enabled

# CORS configuration
aws s3api put-bucket-cors \
  --bucket my-bucket \
  --cors-configuration file://cors.json

# Multipart upload — large file (manual steps)
aws s3api create-multipart-upload --bucket my-bucket --key large-file.iso
# Then upload parts, then complete — s3 cp handles this automatically for >8MB

Senior tip: For files > 100 MB, aws s3 cp uses multipart automatically. You rarely need to manage multipart manually. Use aws configure set default.s3.multipart_threshold 64MB to tune the threshold.


3. EC2

Describe instances — the foundation

# All instances (verbose)
aws ec2 describe-instances

# Filter by state
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running"

# Filter by tag
aws ec2 describe-instances \
  --filters "Name=tag:Environment,Values=prod" \
            "Name=tag:Role,Values=web"

# Extract useful fields with --query (JMESPath)
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,Tags[?Key==`Name`].Value|[0]]' \
  --output table

# Get instance IDs only
aws ec2 describe-instances \
  --filters "Name=tag:Environment,Values=prod" \
  --query 'Reservations[*].Instances[*].InstanceId' \
  --output text

Launch and manage instances

# Launch an instance
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.medium \
  --key-name my-keypair \
  --security-group-ids sg-0abc123 \
  --subnet-id subnet-0def456 \
  --iam-instance-profile Name=MyInstanceProfile \
  --user-data file://user-data.sh \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-01},{Key=Environment,Value=prod}]' \
  --count 1

# Start / stop / terminate
aws ec2 start-instances --instance-ids i-0abc123 i-0def456
aws ec2 stop-instances --instance-ids i-0abc123
aws ec2 terminate-instances --instance-ids i-0abc123

# Reboot
aws ec2 reboot-instances --instance-ids i-0abc123

# Create AMI from running instance
aws ec2 create-image \
  --instance-id i-0abc123 \
  --name "web-01-$(date +%Y%m%d-%H%M)" \
  --description "Production snapshot before deployment" \
  --no-reboot

# Wait for instance to be running
aws ec2 wait instance-running --instance-ids i-0abc123
echo "Instance is running"

Security groups

# Create a security group
aws ec2 create-security-group \
  --group-name web-sg \
  --description "Web server security group" \
  --vpc-id vpc-0abc123

# Allow inbound HTTP/HTTPS
aws ec2 authorize-security-group-ingress \
  --group-id sg-0abc123 \
  --protocol tcp --port 80 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
  --group-id sg-0abc123 \
  --protocol tcp --port 443 --cidr 0.0.0.0/0

# Allow SSH from specific IP only
aws ec2 authorize-security-group-ingress \
  --group-id sg-0abc123 \
  --protocol tcp --port 22 \
  --cidr $(curl -s ifconfig.me)/32

# Revoke a rule
aws ec2 revoke-security-group-ingress \
  --group-id sg-0abc123 \
  --protocol tcp --port 22 --cidr 0.0.0.0/0

# Describe security groups with filter
aws ec2 describe-security-groups \
  --filters "Name=group-name,Values=web-sg"

Key pairs, Elastic IPs, VPC

# Create key pair (save private key immediately — AWS won't show it again)
aws ec2 create-key-pair \
  --key-name my-keypair \
  --query 'KeyMaterial' \
  --output text > ~/.ssh/my-keypair.pem
chmod 600 ~/.ssh/my-keypair.pem

# Allocate and associate Elastic IP
aws ec2 allocate-address --domain vpc
aws ec2 associate-address \
  --instance-id i-0abc123 \
  --allocation-id eipalloc-0abc123

# Describe VPCs and subnets
aws ec2 describe-vpcs
aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=vpc-0abc123" \
  --query 'Subnets[*].[SubnetId,CidrBlock,AvailabilityZone,Tags[?Key==`Name`].Value|[0]]' \
  --output table

EBS volumes

# Create and attach volume
aws ec2 create-volume \
  --size 100 \
  --volume-type gp3 \
  --availability-zone ap-southeast-1a \
  --encrypted \
  --throughput 250

aws ec2 attach-volume \
  --volume-id vol-0abc123 \
  --instance-id i-0abc123 \
  --device /dev/sdf

# Detach volume
aws ec2 detach-volume --volume-id vol-0abc123

# Create snapshot
aws ec2 create-snapshot \
  --volume-id vol-0abc123 \
  --description "Pre-migration backup $(date +%Y%m%d)"

# Describe snapshots owned by me
aws ec2 describe-snapshots --owner-ids self \
  --query 'Snapshots[*].[SnapshotId,VolumeSize,StartTime,Description]' \
  --output table

IMDSv2 — instance metadata (secure)

# From inside an EC2 instance — IMDSv2 (token-based, required on modern instances)
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/

Gotcha: IMDSv1 (no token) is disabled by default on new instances. Don’t write scripts that use the old unauthenticated curl pattern — they’ll silently fail on newer instances.


4. IAM

Users, groups, roles

# Create user, group, attach to group
aws iam create-user --user-name jane.doe
aws iam create-group --group-name developers
aws iam add-user-to-group --user-name jane.doe --group-name developers

# Create access key (store output securely — shown once)
aws iam create-access-key --user-name jane.doe

# List users with last login
aws iam list-users \
  --query 'Users[*].[UserName,CreateDate,PasswordLastUsed]' \
  --output table

# Create a role (trust policy required)
aws iam create-role \
  --role-name LambdaExecRole \
  --assume-role-policy-document file://trust-policy.json

# Attach managed policy to role
aws iam attach-role-policy \
  --role-name LambdaExecRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

# Attach inline policy
aws iam put-role-policy \
  --role-name LambdaExecRole \
  --policy-name S3ReadPolicy \
  --policy-document file://s3-read-policy.json

Policy inspection

# List all customer-managed policies
aws iam list-policies --scope Local \
  --query 'Policies[*].[PolicyName,Arn,DefaultVersionId]' \
  --output table

# Get policy document (check what version is default first)
aws iam get-policy --policy-arn arn:aws:iam::123456789012:policy/MyPolicy
aws iam get-policy-version \
  --policy-arn arn:aws:iam::123456789012:policy/MyPolicy \
  --version-id v3

# List policies attached to a role
aws iam list-attached-role-policies --role-name MyRole
aws iam list-role-policies --role-name MyRole   # inline policies

# Simulate permissions — critical for debugging access denied
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/MyRole \
  --action-names s3:GetObject ec2:DescribeInstances \
  --resource-arns arn:aws:s3:::my-bucket/*

Senior tip: simulate-principal-policy is the fastest way to diagnose “Access Denied” errors without making actual API calls. Use it before spending 20 minutes reading JSON policies.

MFA management

# List MFA devices for a user
aws iam list-mfa-devices --user-name jane.doe

# Enable virtual MFA device
aws iam create-virtual-mfa-device \
  --virtual-mfa-device-name jane.doe-mfa \
  --outfile /tmp/qr.png \
  --bootstrap-method QRCodePNG

aws iam enable-mfa-device \
  --user-name jane.doe \
  --serial-number arn:aws:iam::123456789012:mfa/jane.doe-mfa \
  --authentication-code1 123456 \
  --authentication-code2 654321

5. ECS & ECR

ECR — container registry

# Login (Docker must be running)
aws ecr get-login-password --region ap-southeast-1 | \
  docker login --username AWS --password-stdin \
  123456789012.dkr.ecr.ap-southeast-1.amazonaws.com

# Create repository
aws ecr create-repository \
  --repository-name my-app \
  --image-scanning-configuration scanOnPush=true \
  --encryption-configuration encryptionType=AES256

# List images with tags
aws ecr describe-images \
  --repository-name my-app \
  --query 'imageDetails[*].[imageTags[0],imageSizeInBytes,imagePushedAt]' \
  --output table | sort -k3

# Delete untagged images
aws ecr list-images \
  --repository-name my-app \
  --filter tagStatus=UNTAGGED \
  --query 'imageIds[*]' \
  --output json | \
  xargs -I{} aws ecr batch-delete-image \
    --repository-name my-app \
    --image-ids {}

# Set lifecycle policy (keep last 10 images)
aws ecr put-lifecycle-policy \
  --repository-name my-app \
  --lifecycle-policy-text '{"rules":[{"rulePriority":1,"description":"Keep last 10","selection":{"tagStatus":"any","countType":"imageCountMoreThan","countNumber":10},"action":{"type":"expire"}}]}'

ECS — container orchestration

# Create cluster
aws ecs create-cluster \
  --cluster-name production \
  --capacity-providers FARGATE FARGATE_SPOT \
  --default-capacity-provider-strategy \
    capacityProvider=FARGATE,weight=1,base=1

# Register task definition
aws ecs register-task-definition \
  --cli-input-json file://task-definition.json

# Create service
aws ecs create-service \
  --cluster production \
  --service-name web-api \
  --task-definition web-api:5 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc,subnet-def],securityGroups=[sg-xyz],assignPublicIp=DISABLED}" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=web,containerPort=3000"

# Deploy new task definition version (force new deployment)
aws ecs update-service \
  --cluster production \
  --service web-api \
  --task-definition web-api:6 \
  --force-new-deployment

# Wait for service to stabilize
aws ecs wait services-stable \
  --cluster production \
  --services web-api

# Describe running tasks
aws ecs describe-tasks \
  --cluster production \
  --tasks $(aws ecs list-tasks --cluster production --service-name web-api \
    --query 'taskArns[0]' --output text)

# ECS Exec — drop into a running container
aws ecs execute-command \
  --cluster production \
  --task arn:aws:ecs:...:task/abc123 \
  --container web \
  --interactive \
  --command "/bin/sh"

Gotcha: ECS Exec requires enableExecuteCommand=true on the service AND the task role must have ssmmessages:CreateControlChannel permissions. Enable at service creation — you can’t add it later without recreating the service.

Senior tip: Use --force-new-deployment with the same task definition to roll an ECS service (e.g., after a config change or to recover from a bad deploy). ECS drains old tasks gracefully.


6. Lambda

Deploy and manage functions

# Create function
aws lambda create-function \
  --function-name my-processor \
  --runtime python3.12 \
  --role arn:aws:iam::123456789012:role/LambdaExecRole \
  --handler app.handler \
  --zip-file fileb://function.zip \
  --memory-size 512 \
  --timeout 30 \
  --environment Variables='{DB_HOST=prod-db.cluster.local,LOG_LEVEL=INFO}'

# Update function code
aws lambda update-function-code \
  --function-name my-processor \
  --zip-file fileb://function.zip

# Update from ECR image
aws lambda update-function-code \
  --function-name my-processor \
  --image-uri 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com/my-app:latest

# Wait for update to complete before invoking
aws lambda wait function-updated --function-name my-processor

# Update configuration
aws lambda update-function-configuration \
  --function-name my-processor \
  --memory-size 1024 \
  --timeout 60 \
  --environment Variables='{LOG_LEVEL=DEBUG}'

Invoke

# Synchronous invoke (waits for response)
aws lambda invoke \
  --function-name my-processor \
  --payload '{"key": "value"}' \
  --log-type Tail \
  output.json
cat output.json

# Async invoke (fire and forget)
aws lambda invoke \
  --function-name my-processor \
  --invocation-type Event \
  --payload '{"batch_id": "abc123"}' \
  /dev/null

# Decode tail logs (base64 encoded)
aws lambda invoke \
  --function-name my-processor \
  --payload '{}' \
  --log-type Tail \
  --query 'LogResult' \
  --output text \
  /dev/null | base64 -d

Versions, aliases, layers

# Publish a version (immutable snapshot of $LATEST)
aws lambda publish-version \
  --function-name my-processor \
  --description "Release 2.1.0"

# Create/update alias
aws lambda create-alias \
  --function-name my-processor \
  --name production \
  --function-version 5

# Blue/green traffic split (canary deploy)
aws lambda update-alias \
  --function-name my-processor \
  --name production \
  --function-version 6 \
  --routing-config AdditionalVersionWeights={"5"=0.1}

# Add permission for another service to invoke
aws lambda add-permission \
  --function-name my-processor \
  --statement-id s3-invoke \
  --action lambda:InvokeFunction \
  --principal s3.amazonaws.com \
  --source-arn arn:aws:s3:::my-bucket

# Publish layer
aws lambda publish-layer-version \
  --layer-name shared-libs \
  --zip-file fileb://layer.zip \
  --compatible-runtimes python3.11 python3.12

Senior tip: Always use aliases + versions in production. Never invoke $LATEST from other services. Aliases give you instant rollback (just point alias to previous version) and enable canary deployments.


7. CloudFormation

Stack lifecycle

# Validate template before deploying
aws cloudformation validate-template \
  --template-body file://template.yaml

# Create stack
aws cloudformation create-stack \
  --stack-name my-infra \
  --template-body file://template.yaml \
  --parameters \
    ParameterKey=Environment,ParameterValue=prod \
    ParameterKey=InstanceType,ParameterValue=t3.medium \
  --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
  --tags Key=Project,Value=myapp

# Wait for create to complete
aws cloudformation wait stack-create-complete --stack-name my-infra

# Update stack
aws cloudformation update-stack \
  --stack-name my-infra \
  --template-body file://template.yaml \
  --parameters ParameterKey=InstanceType,ParameterValue=t3.large \
  --capabilities CAPABILITY_IAM

# Deploy (create or update, idempotent — preferred for CI/CD)
aws cloudformation deploy \
  --template-file template.yaml \
  --stack-name my-infra \
  --parameter-overrides Environment=prod \
  --capabilities CAPABILITY_IAM \
  --no-fail-on-empty-changeset

# Delete stack
aws cloudformation delete-stack --stack-name my-infra
aws cloudformation wait stack-delete-complete --stack-name my-infra

Change sets — safe updates

# Create a change set (preview what will change)
aws cloudformation create-change-set \
  --stack-name my-infra \
  --change-set-name patch-2024-05 \
  --template-body file://template.yaml \
  --capabilities CAPABILITY_IAM

# Describe the change set (review before applying)
aws cloudformation describe-change-set \
  --stack-name my-infra \
  --change-set-name patch-2024-05 \
  --query 'Changes[*].ResourceChange.[Action,ResourceType,LogicalResourceId,Replacement]' \
  --output table

# Execute (apply the change set)
aws cloudformation execute-change-set \
  --stack-name my-infra \
  --change-set-name patch-2024-05

Troubleshoot stack events

# List recent stack events (great for debugging failures)
aws cloudformation describe-stack-events \
  --stack-name my-infra \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED` || ResourceStatus==`UPDATE_FAILED`].[LogicalResourceId,ResourceStatusReason]' \
  --output table

# List all resources in a stack
aws cloudformation list-stack-resources \
  --stack-name my-infra \
  --query 'StackResourceSummaries[*].[LogicalResourceId,ResourceType,ResourceStatus,PhysicalResourceId]' \
  --output table

Senior tip: Use aws cloudformation deploy --no-fail-on-empty-changeset in CI/CD pipelines. Without it, the pipeline fails if there’s nothing to change — which is a valid state on re-runs.


8. RDS & DynamoDB

RDS

# Describe DB instances
aws rds describe-db-instances \
  --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,DBInstanceStatus,Endpoint.Address]' \
  --output table

# Create snapshot
aws rds create-db-snapshot \
  --db-instance-identifier prod-postgres \
  --db-snapshot-identifier prod-postgres-$(date +%Y%m%d)

# Wait for snapshot to be available
aws rds wait db-snapshot-completed \
  --db-snapshot-identifier prod-postgres-20240501

# Restore from snapshot (creates new instance)
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier prod-postgres-restored \
  --db-snapshot-identifier prod-postgres-20240501 \
  --db-instance-class db.t3.medium \
  --no-multi-az

# Modify instance (e.g., upgrade class) — apply immediately or next maintenance window
aws rds modify-db-instance \
  --db-instance-identifier prod-postgres \
  --db-instance-class db.r6g.large \
  --apply-immediately

Cost warning: --apply-immediately on an RDS modify causes a brief downtime. Without it, changes apply during the maintenance window. In production, prefer the maintenance window unless urgent.

DynamoDB

# Create table
aws dynamodb create-table \
  --table-name users \
  --attribute-definitions \
    AttributeName=userId,AttributeType=S \
    AttributeName=createdAt,AttributeType=N \
  --key-schema \
    AttributeName=userId,KeyType=HASH \
    AttributeName=createdAt,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

# Put item
aws dynamodb put-item \
  --table-name users \
  --item '{"userId":{"S":"u123"},"createdAt":{"N":"1714521600"},"name":{"S":"Alice"}}'

# Get item
aws dynamodb get-item \
  --table-name users \
  --key '{"userId":{"S":"u123"},"createdAt":{"N":"1714521600"}}'

# Query by partition key (efficient — uses index)
aws dynamodb query \
  --table-name users \
  --key-condition-expression "userId = :uid" \
  --expression-attribute-values '{":uid":{"S":"u123"}}'

# Scan with filter (expensive — full table read)
aws dynamodb scan \
  --table-name users \
  --filter-expression "contains(#n, :name)" \
  --expression-attribute-names '{"#n":"name"}' \
  --expression-attribute-values '{":name":{"S":"Ali"}}'

# Batch write (up to 25 items)
aws dynamodb batch-write-item \
  --request-items file://batch-items.json

# PartiQL — SQL-like syntax
aws dynamodb execute-statement \
  --statement "SELECT * FROM users WHERE userId = 'u123'"

# Update table capacity mode
aws dynamodb update-table \
  --table-name users \
  --billing-mode PROVISIONED \
  --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=50

Gotcha: scan reads every item in the table — it will consume all your read capacity units and cost you money on large tables. Always use query with an index for production access patterns.


9. CloudWatch & Logs

Log groups and streams

# List log groups
aws logs describe-log-groups \
  --query 'logGroups[*].[logGroupName,retentionInDays,storedBytes]' \
  --output table

# Create log group with retention
aws logs create-log-group --log-group-name /app/production
aws logs put-retention-policy \
  --log-group-name /app/production \
  --retention-in-days 30

# Filter log events (search within a time range)
aws logs filter-log-events \
  --log-group-name /aws/lambda/my-processor \
  --filter-pattern "ERROR" \
  --start-time $(date -d '1 hour ago' +%s000) \
  --limit 50

# Tail logs in real-time (requires AWS CLI v2)
aws logs tail /aws/ecs/production --follow --format short

# Get a specific stream
aws logs get-log-events \
  --log-group-name /app/production \
  --log-stream-name "web/app/abc123" \
  --start-from-head

Metrics and alarms

# Put custom metric data
aws cloudwatch put-metric-data \
  --namespace "MyApp" \
  --metric-name "OrdersProcessed" \
  --value 42 \
  --unit Count \
  --dimensions Environment=prod,Service=order-worker

# Get metric statistics
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

# Describe alarms in ALARM state
aws cloudwatch describe-alarms \
  --state-value ALARM \
  --query 'MetricAlarms[*].[AlarmName,StateReason,MetricName]' \
  --output table

CloudWatch Logs Insights

# Run an Insights query (returns a query ID)
QUERY_ID=$(aws logs start-query \
  --log-group-name /aws/lambda/my-processor \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string 'fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20' \
  --query 'queryId' --output text)

# Retrieve results
aws logs get-query-results --query-id "$QUERY_ID"

Senior tip: aws logs tail --follow is your best friend for live debugging ECS/Lambda issues. Combine with --filter-pattern to reduce noise: aws logs tail /aws/ecs/prod --follow --filter-pattern "ERROR".


10. Route 53

DNS management

# List hosted zones
aws route53 list-hosted-zones \
  --query 'HostedZones[*].[Name,Id,Config.PrivateZone]' \
  --output table

# List records in a zone
aws route53 list-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --query 'ResourceRecordSets[*].[Name,Type,TTL,ResourceRecords[0].Value]' \
  --output table

# Upsert a DNS record (create or update)
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "TTL": 60,
        "ResourceRecords": [{"Value": "1.2.3.4"}]
      }
    }]
  }'

# Delete a record (must match existing exactly)
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --change-batch '{
    "Changes": [{
      "Action": "DELETE",
      "ResourceRecordSet": {
        "Name": "old.example.com",
        "Type": "CNAME",
        "TTL": 300,
        "ResourceRecords": [{"Value": "lb-old.ap-southeast-1.elb.amazonaws.com"}]
      }
    }]
  }'

# Create health check
aws route53 create-health-check \
  --caller-reference $(date +%s) \
  --health-check-config '{
    "Type": "HTTPS",
    "FullyQualifiedDomainName": "api.example.com",
    "Port": 443,
    "ResourcePath": "/health",
    "RequestInterval": 30,
    "FailureThreshold": 3
  }'

Gotcha: Route 53 DELETE requires you to pass the exact current record values. If they don’t match, you get an error. Always list-resource-record-sets first to confirm current state before deleting.


11. Secrets Manager & SSM Parameter Store

Secrets Manager

# Create a secret
aws secretsmanager create-secret \
  --name prod/myapp/db-credentials \
  --secret-string '{"username":"admin","password":"s3cr3t"}' \
  --description "Production DB credentials"

# Get secret value
aws secretsmanager get-secret-value \
  --secret-id prod/myapp/db-credentials \
  --query 'SecretString' \
  --output text | jq .

# Get specific version
aws secretsmanager get-secret-value \
  --secret-id prod/myapp/db-credentials \
  --version-id abc123

# Update secret (rotates AWSCURRENT to AWSPREVIOUS)
aws secretsmanager update-secret \
  --secret-id prod/myapp/db-credentials \
  --secret-string '{"username":"admin","password":"n3wP@ss"}'

# List secrets
aws secretsmanager list-secrets \
  --query 'SecretList[*].[Name,LastChangedDate,RotationEnabled]' \
  --output table

SSM Parameter Store

# Create parameters
aws ssm put-parameter \
  --name /prod/myapp/db-host \
  --value "prod-db.cluster.local" \
  --type String

aws ssm put-parameter \
  --name /prod/myapp/api-key \
  --value "supersecretkey" \
  --type SecureString \
  --key-id alias/aws/ssm

# Get a single parameter (decrypt SecureString)
aws ssm get-parameter \
  --name /prod/myapp/api-key \
  --with-decryption \
  --query 'Parameter.Value' \
  --output text

# Get multiple parameters at once
aws ssm get-parameters \
  --names /prod/myapp/db-host /prod/myapp/api-key \
  --with-decryption

# Get all parameters under a path (paginated)
aws ssm get-parameters-by-path \
  --path /prod/myapp \
  --with-decryption \
  --recursive \
  --query 'Parameters[*].[Name,Value]' \
  --output table

# Export all params under a path as env vars
aws ssm get-parameters-by-path \
  --path /prod/myapp --with-decryption \
  --query 'Parameters[*].[Name,Value]' \
  --output text | \
  awk '{n=split($1,a,"/"); print toupper(a[n])"="$2}' > .env

SSM Session Manager — no bastion required

# Start interactive session (no SSH key needed, all traffic logged)
aws ssm start-session --target i-0abc123

# Port forwarding (tunnel RDS through SSM)
aws ssm start-session \
  --target i-0abc123 \
  --document-name AWS-StartPortForwardingSessionToRemoteHost \
  --parameters '{"host":["prod-db.cluster.local"],"portNumber":["5432"],"localPortNumber":["5432"]}'

# Run a command on multiple instances
aws ssm send-command \
  --document-name AWS-RunShellScript \
  --targets Key=tag:Environment,Values=prod \
  --parameters 'commands=["systemctl restart nginx"]' \
  --output-s3-bucket-name my-ssm-logs

Senior tip: Replace all SSH bastion access with SSM Session Manager. Every session is logged to CloudTrail and optionally to S3/CloudWatch Logs. Zero open inbound ports = massive attack surface reduction.

Gotcha: SSM Parameter Store limits free tier to 10,000 standard parameters. Use Secrets Manager for credentials (supports automatic rotation). Use Parameter Store for non-sensitive config. The pricing differs significantly at scale.


12. SQS & SNS

SQS — queues

# Create queue (standard or FIFO)
aws sqs create-queue --queue-name my-jobs
aws sqs create-queue \
  --queue-name my-ordered-jobs.fifo \
  --attributes FifoQueue=true,ContentBasedDeduplication=true

# Get queue URL (needed for most operations)
QUEUE_URL=$(aws sqs get-queue-url \
  --queue-name my-jobs \
  --query 'QueueUrl' --output text)

# Send message
aws sqs send-message \
  --queue-url "$QUEUE_URL" \
  --message-body '{"job":"process-order","orderId":"ord-123"}' \
  --delay-seconds 5

# Receive messages (returns up to 10)
aws sqs receive-message \
  --queue-url "$QUEUE_URL" \
  --max-number-of-messages 10 \
  --wait-time-seconds 20 \
  --visibility-timeout 30

# Delete message after processing
aws sqs delete-message \
  --queue-url "$QUEUE_URL" \
  --receipt-handle "AQEBw..."

# Purge queue (delete all messages)
aws sqs purge-queue --queue-url "$QUEUE_URL"

# Get queue attributes (depth, age, etc.)
aws sqs get-queue-attributes \
  --queue-url "$QUEUE_URL" \
  --attribute-names All \
  --query 'Attributes.{Depth:ApproximateNumberOfMessages,InFlight:ApproximateNumberOfMessagesNotVisible,Age:ApproximateAgeOfOldestMessage}'

Gotcha: Long polling (--wait-time-seconds 20) is critical in production. Without it (short polling), you pay for empty receives and miss messages. Always use 20 seconds in consumers.

SNS — pub/sub

# Create topic
TOPIC_ARN=$(aws sns create-topic \
  --name my-alerts \
  --query 'TopicArn' --output text)

# Subscribe endpoints
aws sns subscribe \
  --topic-arn "$TOPIC_ARN" \
  --protocol email \
  --notification-endpoint ops-team@example.com

aws sns subscribe \
  --topic-arn "$TOPIC_ARN" \
  --protocol sqs \
  --notification-endpoint arn:aws:sqs:ap-southeast-1:123456789012:my-jobs

# Publish a notification
aws sns publish \
  --topic-arn "$TOPIC_ARN" \
  --subject "Deploy: prod v2.1.0 succeeded" \
  --message "Deployment completed at $(date). No errors."

# Publish with message attributes (for filtering)
aws sns publish \
  --topic-arn "$TOPIC_ARN" \
  --message "Order placed" \
  --message-attributes '{"event":{"DataType":"String","StringValue":"order.created"}}'

# List subscriptions for a topic
aws sns list-subscriptions-by-topic --topic-arn "$TOPIC_ARN"

13. Advanced Patterns

–query with JMESPath — filter and transform output

# Extract nested value
aws ec2 describe-instances \
  --query 'Reservations[0].Instances[0].PrivateIpAddress'

# Filter array, project fields
aws ec2 describe-instances \
  --query 'Reservations[*].Instances[?State.Name==`running`].[InstanceId,InstanceType,PrivateIpAddress]'

# Use a tag value (complex path)
aws ec2 describe-instances \
  --query 'Reservations[*].Instances[*].{ID:InstanceId,Name:Tags[?Key==`Name`].Value|[0],IP:PrivateIpAddress}'

# Sort by field (JMESPath sort_by)
aws ec2 describe-snapshots --owner-ids self \
  --query 'sort_by(Snapshots, &StartTime)[-5:].[SnapshotId,StartTime,VolumeSize]' \
  --output table

# Count results
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'length(Reservations[*].Instances[])'

–output text + shell pipelines

# Restart all instances with a tag
aws ec2 describe-instances \
  --filters "Name=tag:Role,Values=worker" "Name=instance-state-name,Values=running" \
  --query 'Reservations[*].Instances[*].InstanceId' \
  --output text | \
  xargs aws ec2 reboot-instances --instance-ids

# Bulk delete ECR images
aws ecr list-images \
  --repository-name my-app \
  --filter tagStatus=UNTAGGED \
  --query 'imageIds[*]' \
  --output json | \
  xargs -I imageIds aws ecr batch-delete-image \
    --repository-name my-app \
    --image-ids imageIds

Waiters — synchronize on resource state

# Waiters poll every N seconds until condition is met or timeout (25 retries default)
aws ec2 wait instance-running --instance-ids i-0abc123
aws ec2 wait instance-stopped --instance-ids i-0abc123
aws ec2 wait image-available --image-ids ami-0abc123
aws s3api wait bucket-exists --bucket my-bucket
aws cloudformation wait stack-create-complete --stack-name my-stack
aws rds wait db-instance-available --db-instance-identifier prod-db
aws lambda wait function-updated --function-name my-function
aws ecs wait services-stable --cluster prod --services web-api

Senior tip: Waiters exit with code 255 on timeout. In bash scripts, always check the exit code: aws ec2 wait instance-running ... || { echo "Timeout waiting for instance"; exit 1; }.

Pagination — handle large result sets

# CLI auto-paginates by default for most commands
# To limit and paginate manually:
aws s3api list-objects-v2 \
  --bucket my-large-bucket \
  --max-items 100 \
  --starting-token "NEXT_TOKEN_FROM_PREVIOUS"

# Auto-paginate with a loop (using --no-paginate disables auto-paging)
aws ec2 describe-instances \
  --no-paginate \
  --max-results 50

# Get all pages with a shell loop
NEXT_TOKEN=""
while true; do
  RESULT=$(aws s3api list-objects-v2 \
    --bucket my-bucket \
    --max-keys 1000 \
    ${NEXT_TOKEN:+--starting-token "$NEXT_TOKEN"})
  echo "$RESULT" | jq '.Contents[].Key'
  NEXT_TOKEN=$(echo "$RESULT" | jq -r '.NextContinuationToken // empty')
  [[ -z "$NEXT_TOKEN" ]] && break
done

Dry-run — test permissions before running

# EC2 supports --dry-run natively — returns DryRunOperation if you have permission
aws ec2 run-instances --dry-run \
  --image-id ami-0abc123 \
  --instance-type t3.micro \
  --count 1

# Expected output on success:
# An error occurred (DryRunOperation) when calling the RunInstances operation:
# Request would have succeeded, but DryRun flag is set.

Tag-based filtering across services

# Filter by multiple tags
aws ec2 describe-instances \
  --filters \
    "Name=tag:Environment,Values=prod" \
    "Name=tag:Team,Values=platform" \
    "Name=instance-state-name,Values=running"

# Tag Resources API — bulk tag across resource types
aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=Environment,Values=prod \
  --resource-type-filters ec2:instance rds:db \
  --query 'ResourceTagMappingList[*].ResourceARN' \
  --output text

# Bulk tag multiple resources
aws ec2 create-tags \
  --resources i-0abc123 vol-0def456 sg-0ghi789 \
  --tags Key=Environment,Value=prod Key=CostCenter,Value=engineering

Useful global flags reference

FlagPurposeExample
--profile NAMEUse named credential profile--profile prod
--region REGIONOverride configured region--region us-east-1
--output FORMATjson, text, table, yaml--output table
--query EXPRJMESPath filter on output--query 'Users[0].Arn'
--no-paginateDisable auto-paginationfor large result sets
--dry-runTest permission without actingEC2 operations
--debugFull HTTP request/response logDiagnosing TLS/auth issues
--no-verify-sslSkip SSL verificationNever use in production
--endpoint-url URLOverride service endpointLocalStack, custom VPC endpoints
--cli-input-jsonRead all params from fileComplex commands
--generate-cli-skeletonPrint JSON template for inputScaffold --cli-input-json files
# Generate skeleton for any command — great for complex resource creation
aws ec2 run-instances --generate-cli-skeleton > run-instances.json
# Edit the JSON, then:
aws ec2 run-instances --cli-input-json file://run-instances.json

Senior tip: Set AWS_PAGER="" to disable the automatic less pager for scripting: export AWS_PAGER="". Add to your .zshrc or .bashrc for permanent effect. The pager blocks scripts that pipe CLI output.


Quick Reference: Common Patterns

# Who am I, what account, what region?
aws sts get-caller-identity && aws configure get region

# Find instances by private IP
aws ec2 describe-instances \
  --filters "Name=private-ip-address,Values=10.0.1.42" \
  --query 'Reservations[0].Instances[0].[InstanceId,Tags[?Key==`Name`].Value|[0]]'

# List all IAM roles that can be assumed by Lambda
aws iam list-roles \
  --query 'Roles[?contains(AssumeRolePolicyDocument.Statement[0].Principal.Service,`lambda.amazonaws.com`)].RoleName'

# Find all public S3 buckets (ACL-based)
for bucket in $(aws s3 ls | awk '{print $3}'); do
  acl=$(aws s3api get-bucket-acl --bucket $bucket --query 'Grants[?Grantee.URI==`http://acs.amazonaws.com/groups/global/AllUsers`]' --output text)
  [[ -n "$acl" ]] && echo "PUBLIC: $bucket"
done

# Cost-by-service last month (requires Cost Explorer enabled)
aws ce get-cost-and-usage \
  --time-period Start=2024-04-01,End=2024-05-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[0].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \
  --output table | sort -k2 -rn