These are the notes I made while preparing for AWS Solutions Architect Professional certification.

Compute

EC2

Autoscaling

Triggered by event or scaling action
Requires a launch configuration
Launch configurations are immutable, you need to create a new one for every change

Scaling options

Maintain - keep a minimum number of instances running
Manual - specify maximum, minimum or a specific number
Schedule - increase or decrease number of instances based on schedule
Dynamic - scale based on real-time metrics

Scaling policies

Target tracking - metric in relation to target value
Step scaling - adjust capacity given certain thresholds (Has warm-up period)
Simple scaling - wait util health-check and cool-down period expires

Disaster Recovery

AMI snapshots can be copied to another AZ for resiliency
Reserved instances is the only way to guarantee that the resource will be available when needed
ELBs and Route53 offer health-check feature for self-healing

Batch

A management tool for reoccurring batch tasks, like rotating logs on Firewall appliances

Create a compute environment
Specify a Job Queue with priority
Define a Job
Schedule the Job

ECS

AWS-specific platform that supports Docker containers

Considered easier to use, but limited
Relies heavily on AWS services like Route53, ALB, CloudWatch
Containers run isolated and are grouped in “Tasks”

EKS

A platform, fully compatible with Kubernetes

Considered more complex and feature-rich and extensible
Handles many things internally
Containers have access to each other within a “Pod”

Lambda

Supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby code
Stateless
Triggered by SNS, SQS, S3 events, DynamoDB streams, API Gateway or CloudFront requests

Serverless application model

Open-source framework for building serverless apps on AWS

Uses YAML as configuration language
Includes a cli tool to manage serverless infra via CloudFormation
You can test Lambda functions locally via Docker-based emulator

EventBridge

An event-bus service that links AWS with 3rd party applications

Step Functions

Managed workflow and orchestration platform
Good for order processing workflow
Defines apps as state machines
Create tasks, sequential steps, branching paths and timers
Uses Amazon State Language (JSON)

Glue

Service to build event-driven ETL pipelines

Supports Scala and Python

Elastic MapReduce (EMR)

Managed Hadoop framework for data processing

Supports Apache Spark, HBase, Presto and Flink
Good for log analysis and ETLs
Steps are units of work in EMR
Cluster gets deployed on EC2 instances
Master nodes
Core nodes store data on HDFS
Task nodes are ephemeral

AI and Machine Learning

SageMaker - A framework to build custom ML models
Greengrass - IoT solution to perform ML inferences locally on devices
Comprehend - Natural Language Processing (sentiment Analysis)
Forecast - Give predictions on time-series data
Lex - Conversational interface (like Alexa)
Personalise - Recommendation engine
Polly - Text to speech
Rekognition - Image/Video processing (recognize objects, people, activities)
Textract - OCR engine to extract text from scanned documents
Transcribe - Speech-To-Text
Translate - Language translation

Storage

Persistent Data stores

S3

Use-cases:

User-generated content, video- and photo-sharing
Static websites
Recent backups
Log storage
Data lake (Athena, Redshift Spectrum, Quicksight)
IoT Streaming Data Storage (Kinesis Firehose)
ML and AI Storage (Rekognition, Lex, MXNet)

Properties:

3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD
Concurrent reads and writes
Maximum object size is 5 TB
The largest object in a single PUT is 5 GB
Recommended to use multi-part uploads if larger than 100 MB (improved throughput and recovery from network issues)
Cross-Region Replication (does NOT replicate existing objects)

Good-to-know:

Since 2019 supports strong read-after-write consistency
Hexadecimal prefixes in a bucket can help parallelize requests
S3 Management Analytics to get reports and optimize-costs
Transfer Acceleration which is like a CloudFront in revers to speed up uploads
“Requester Pays” buckets to make available large datasets and don’t pay for data transfer
Tags are vital for billing
Can trigger SQS, SNS or Lambda based on the bucket event
Static Web hosting
S3 supports BitTorrent protocol

Security in S3

Access

Resource based
- Object ACLs
- Bucket policies
User based (IAM Policy)
MFA before Delete (or Version state change)
Access logging

Encryption

SSE-S3 - Use existing S3 AES-256 key
SSE-C - Upload an own AES-256 key
SSE-KMS - Generate a KMS key and use it.
Client-side - Encrypt before uploading

Storage classes

S3 Standard

Default tier
11-9’s durability, 99.99% availability
Availability SLA is 99,9%

S3 Standard IA

IA - Infrequently Accessed. Cheaper than Standard, but you are charged a retrieval fee.

Same properties as Standard
Minimum billable object size is 128 KB

S3 One Zone-IA

Same as Standard IA, but replicated only in a single AZ

11-9’s durability, 99.5% availability

S3 Intelligent Tiering

Good for unknown data access patterns
Moves data to the most cost-effective access tier

S3 Glacier

This is the service used by the AWS Storage Gateway Virtual Tape Library under the hood. It is integrated to S3, but has its own API and concepts. Objects stored in Glacier via S3 API are not accessible vie Glacier own API.

Used for archival only or “Cold storage”
Very cheap
3-5 hours to retrieve the data
Content of the archive is immutable
Minimal storage duration: 90 days
Integrated with AWS CloudTrail for audit

Concepts

Glacier Vault = S3 Bucket
Glacier Achieve = S3 Object (Max size: 40TB, Immutable)
Glacier Policy defines what rules the Vault follows
Vault Lock enforces the policy (After creating it, you have 24 hours to abort operation)
Access is provided via IAM

S3 Glacier Deep Archive

The lowest-cost storage class

Minimal storage duration: 180 days

Lifecycle Policies

All calculations use the object upload date
Lifecycle policies run once a day
Expiration for versioned buckets means that object is marked as non-current
A bucket can have only one policy with multiple rules

Amazon Athena

SQL Engine on top of S3 using Presto

To boost performance the data should be stored in Parquet format. It is close to Redshift Spectrum functionality.

Athena - for data that is stored in S3 only
Spectrum - to join data in S3 with Redshift tables

Supported formats:

Apache ORC
Apache Web Logs
CSV
TSV
Text File with Custom Delimiters
JSON
Parquet

EBS Volumes

These are virtual hard drives that can only be used with EC2. EBS volumes are tied to a single AZ and can be changed while attached to the instance.

Types:

General Purpose SSD (gp2) - burstable IOPS storage for a broad range of workloads
New General Purpose SSD (gp3) - cheaper option with higher latencies
Provisioned IOPS SSD (io1) - option for databases, or I/O-intensive workloads
Throughput Optimized HDD (st1) - throughput-intensive workloads and large files (streaming, big data, logs)
Cold HDD (sc1) - low cost volume for large cold datasets

Snapshots

For EBS snapshots, you are charged only for storage actually used. Snapshots are replicated within a single AZ.

Use-cases:

Cost-effective and easy backup strategy
Migrate EC2 instances between AZs
Share datasets with other users/accounts
Convert an unencrypted volume to an encrypted one
Support RAID configurations

Snapshot Lifecycle Policy

Scheduled snapshots are done via AWS Ops Automator
Retention rules to remove stale snapshots

RAID configurations

	RAID 0	RAID 1	RAID 5	RAID 6
Redundancy	None	1 down	1 down	2 down
Reads	* * * *	* * *	* * * *	* * * *
Writes	* * * *	* * *	* *	*
Capacity	100%	50%	(n-1)/n	(n-2)/n

EFS

Good for multi-attach, file based solutions
Implements NFS 4 and 4.1 file share
Pay for what you use billing model
Multi AZ metadata and data storage
Goes along well with AWS DataSync
For write-heavy workloads EFS is
- 3 times more expensive than EBS
- 20 more expensive than S3
Burstable with a baseline rate of 50 MiB/s and burst rate of 100 MiB/s.

General Purpose performance mode is limited to 7,000 IOPS per file system.

Max I/O performance mode allows higher throughput sacrificing latency

Storage Gateway

Provides local storage resources backed by AWS S3. Often used for disaster recovery and cloud migrations.

Doesn’t have an SLA
File Gateway - An interface to S3 via NFS or SMB protocol
Volume Gateway - iSCSI protocol
- Stored Volume - low-latency access to your entire dataset (Good for disaster recovery use-case)
- Cached Volume - copy of frequently accessed data locally
Tape gateway - Media tape library to use with existing backup software

Transient Data stores

SQS

First AWS service
Integration with KMS for encrypted messaging
Default storage - 4 days, max - 14 days
Optionally supports FIFO
Message is 256 KB (Up to 2 GB via AWS Java SDK)

Use-cases:

Pull-based interaction
Persistent task storage
Controlled completion
Example is image resize process

SNS

Enables pub/sub design pattern
Push-based interaction
Bulk notification
Mobile pushes
Suitable for “Fan Out” use-case

Supported protocols

HTTP/S
E-mail (plaintext or JSON)
SMS
SQS
Mobile App
Lambda
Firehose

Kinesis

Collection of services designed around stream data processing

Gigabytes of data from thousands of sources
Real-time
Stores data for 24 hours (configurable up to 7 days)
One shard ingests up to 1000 records/sec
Default limit is 500 shards
A record consists of:
- Partition key
- Sequence number
- Data blob (up to 1MB)
Use Kinesis Client Library (KCL) for optimal integration

Snowball

Move large amounts of data into and out of AWS using physical appliances of 80 TB or 50 TB(only US regions)

Use-cases:

If loading your data over the Internet would take a week or more
Cloud migration
Disaster recovery
Datacenter decommission

Snowball Edge

Same as Snowball but with Lambda and clustering onboard

Snowmobile

Loads up to 100PB of data on a truck and transports it to AWS datacenter

Amazon MQ

Managed implementation of Apache ActiveMQ

Good candidate for a “Lift and Shift” migration
Supports JMS, NMS, MQTT and WebSockets
For new applications it’s better to use SQS

Ephemeral Data stores

EC2 Instance store

Storage attached to an EC2 instance directly that gives you better performance.

Use-cases:

Cache
Buffers
Work areas

i2 (SSD) and d2 (HDD) instance types are storage-optimised.

ElastiCache

Redis

Web session storage
Leaderboard
Encryption
Clustering
Pub/sub
Complex data types (like geospatial indexes)

Memcached

Caching proxy in front of RDS
Caching whole responses or objects
Simple, easy to scale out and in solution

Databases

ACID

Atomic
Consistent
Isolated
Durable

BASE

Basically available
Soft state
Eventually consistent

RDS

Up to 5 read replicas
Sync replication within the Region, Async - cross Region

Supported engines

PostgreSQL
MySQL
MariaDB
Oracle ( License Included or Bring-Your-Own-License )
Microsoft SQL Server

Anti-patterns

For large binary files - use S3
For an autoscaling - use DynamoDB
For key-value data or unstructured data - use DynamoDB
For DB2 or SAP HANA - use EC2
For complete control over the DB - use EC2

DynamoDB

Multi-AZ NoSQL data storage with priced based on throughput. Partition Key and Sort Key forms a Primary key. Secondary indexes can be Global or Local. Local Secondary indexes use the same partition key, but different sort key. You can even create table replicas using global secondary index and balance the load.

For timeseries data it’s better to create a new table per time period to balacnce

Capacity options:

Provisioned capacity units (CU)
Autoscaling rules
On-Demand (least cost-effective)
Great for unstructured data and click-stream data
Global Tables offer multi-region redundancy
To avoid peak throughout it needs a queue in front
Has streams for replication across regions
Can offer strong consistency and ACID using DynamoDB Transactions

DynamoDB Accelerator

Read-through cache
Write-through cache

Accelerator better suits read-intensive workloads rather than write intensive.

Calculating partition number

By capacity: (Total Read CU / 3000) + (Total Write CU / 1000)
By size: Total Size / 10 GB
Total Partitions: Ceil(Max(By capacity, By size))

Redshift

A database designed for static analysis of data, PostgreSQL compatible.

Not for real-time data ingestion
Needs enhanced VPC Routing to be accessible within the VPC
Redshift Spectrum allows querying S3 directly

Aurora

Supports fast schema changes (DDL)
Scales in increments of 10 GB up to 64 TB
Up to 15 Read replicas
Single AWS Region
Replicates entire database

Quantum Ledger Database (QLDB)

A blockchain database. It is an immutable journal with append-only semantics. Centralized design allows better performance compared to common blockchain frameworks.

Amazon Managed Blockchain offers Hyperledger Fabric and Ethereum blockchain frameworks and uses QLDB internally

Timestream

A timeseries database, alternative to DynamoDB or Redshift. It includes specific features like interpolation and smoothing. It plays well for telemetry data and sensor measurements.

DocumentDB

A fully managed (HA, multi-AZ, KMS encrypted, backed up to S3) MongoDB compatible solution

Elasticsearch (ES)

In AWS context it is mostly a search and analytical tool, but it can also store documents. AWS allows replacing Logstash in ELK stack with CloudWatch, Firehose or Greengrass to build solutions for analytics.

EMR

Managed Spark and Hadoop

Choosing the right option

Option	Use-case
Database on EC2	Full control over the DB or the engine is not available on RDS
RDS	Store well-formed and structured data for OLTP workloads
DynamoDB	Key-value store for unpredictable data types and high performance
Redshift	For massive amounts of data and OLAP workloads
Neptune	Store relationships between objects and graph data
Elasticache	For highly volatile data as a fast temporary storage

From fault tolerance perspective the preferred options are:

DynamoDB
Aurora
Multi-AZ RDS
Database on EC2

Networks

VPC

The largest CIDR range allowed in a VPC is /16 - 65536 addresses.
Minimal CIDR range is /28 - 16 addresses (Effectively 11 available)
DHCP option sets allow you to configure custom DNS and NTP servers
Create subnets in different AZs to make VPC multi-AZ

Reserved IP addresses in a VPC:

10.0.0.0: Network address
10.0.0.1: VPC router
10.0.0.2: DNS server
10.0.0.3: Reserved by AWS for future use
10.0.0.255: Network broadcast address. Broadcast in a VPC is not supported by AWS.

Network ACL

Additional layer of security for VPC

Applied to entire subnets, not individual resources
Allows all inbound and outbound traffic by default
Stateless - no connection tracking
To establish most TCP connections NACL should allow outbound ephemeral ports
Extra layer of protection in addition to security groups

Security groups

Virtual firewalls for individual assets (EC2, RDS, AWS Workspaces, etc)

Controls protocols and port ranges.

Rules are specified by:

Source or Destination IP
Subnet
Security group

Peering

Connectivity between two VPC provided by AWS

The traffic stays in AWS Network
Transitive peering is not supported (A -> B, B-> C, A !> C)
Could be established across AWS accounts

PrivateLink

Connectivity between VPCs or AWS services using interface endpoints

Reach other service in private network via AWS backbone
More granular than VPC peering, it exposes only endpoints, not networks
Unidirectional communication
Preferred option for shared services

VPC Endpoint

By default, IAM users do not have permission to work with endpoints, you need a custom IAM policy.

Interface endpoint

Elastic Network Interface with a Private IP inside your VPC

Uses DNS entries to redirect traffic
Secured by Security Groups

Example AWS products: API gateway, CloudFormation, CloudWatch

Gateway Endpoint

A gateway that is a target for a specific route

Uses prefix lists in the route table
VPC Endpoint policies (Similar to IAM Policies)

Example AWS products: Amazon S3, DynamoDB

VPN

AWS Managed VPN

IPsec VPN connection over your existing network

Use-cases:

Quick and easy way to get a secure tunnel into a VPC
Redundant link for DirectConnect or other VPC VPN

Pros:

Static routes support
BGP peering and routing

Cons:

Depends on the internet connection

DirectConnect (DX)

A dedicated network connection over AWS private lines

Use-cases:

High throughput connection to AWS
Consistent and predictable network bandwidth

Pros:

Speed up to 10Gbps
Potential bandwidth cost reduction
Traffic is secured from internet access

Cons:

May require additional actions from the hosting provider
Not Highly Available
Requires 802.1Q VLAN support and BGP routing

AWS CloudHub

Connect locations in a Hub and Spoke way using Private Gateways.

Use-cases:

Connect multiple offices to access AWS and each other

Pros:

Reuse existing Internet connection
Supports BGP routes

Cons:

No redundancy
Depends on the internet connection

Software VPN

Do everything yourself. Downloading from AWS Marketplace is an option.

Use-cases:

VPN option is not supported by AWS
You have to manage both VPN endpoints for compliance reasons

Pros:

Full control over the setup

Cons:

You need to ensure redundancy in the whole chain

Transit VPC

Create a global networking transit center

One VPC that is a pass-through
Hybrid-deployments for a multi-cloud solutions

Internet access

Internet Gateway

No availability risks, no bandwidth constraints, supports IPv4 and IPv6

Use-cases:

Provide a route table target for Internet-bound traffic
Performs NAT for instance with Public IPs

Egress-Only Gateway

Use it instead of NAT instance for IPv6 communications

NAT Instance

An EC2 instance from an AWS-provided AMI

Bandwidth depends on the instance type
Public IP can be detached
Can apply security groups
Can be used as Bastion server
Costs less than NAT Gateway for a very small installations

NAT Gateway

Fully managed NAT service that replaces NAT Instance on EC2

Should be deployed in a public subnet
Uses an Elastic IP for public IP, that cannot be detached
Multi-AZ availability
Bandwidth up to 45 Gbps
No Security groups
Supports IPv4 only

Placement Groups

Clustered

Grouping instances physically on the same rack or hardware for low-latency communication

Enhanced networking throughput
Finite capacity, better to provision instances up-front

Spread

Instances spread across separate hardware for better fault tolerance

Multi-AZ deployment
Maximum of 7 instances running per group, per AZ

Partition

Spread instance groups across hardware

Designed for large multi-instance applications
Does not support dedicated hosts

ELB

Two-way traffic
Immediate request handling

Classic Load Balancer - ELB

Legacy technology. You should only use it if you depend on classic EC2 (first AWS iteration on EC2 service)

Network Load Balancer - NLB

Very scalable and performant.

Can handle huge spikes in traffic (10s of millions rps)
HTTP pass-through (does not terminate SSL)
Static IP address per AZ

Application Load Balancer - ALB

Backed by an EC2 instance, so it’s less performant but more flexible

Advanced request routing
Does not support Elastic IPs

With Global Accelerator you can get a static IP

Route53

Routing policies:

Simple - just DNS record
Failover - uses health-checks to failover to a backup record (2 records)
Geolocation - returns a record to a resource close to your region
Geoproximity - routes to a closes AWS region
Latency - compares the latency from caller to multiple resources, returns the one with lowest
Multivalue Answer - Multiple IPs, a basic load balancer
Weighted - distribute traffic percentage based on weights

It is possible to use Route53 to manage domains that aren’t registered at AWS.

Ensure that there is a default route configured for Geolocation policy.

CloudFront

Global CDN
Can be configured as a trusted signer to limit access (Signed cookie)
Origin Access Identity
Offers access logs for both Web and RTMP distributions
Reserved capacity plans have discounts if you commit to a minimum monthly usage
Without SNI CloudFront cost extra 600$/month to spin up a dedicated IPv4 in every edge location.
Can configure Geo-restrictions

API Gateway

Managed, highly available service for REST APIs

Can be a proxy in front of Lambda, AWS service or any HTTP API
Based regionally, with an option to optimize edge delivery via CloudFront
Possible to keep it private, or even publish on the AWS marketplace
Supports API keys and Usage Plans for user identification, throttling or quotas

Security

AWS Artifact contains lot’s documents for various compliance certifications

Permission control tools:

Service Control Policies
Permission boundaries
IAM permission policies
Scoped-down policies
Resource-based policies
Endpoint policies

Well-known concepts

DDoS protection

Always have a plan.

Minimize attack surface - NACLs, SGs,VPC design
Scale and absorb - Auto-scaling groups, CloudFront, Static content from S3
Safeguard exposed resources - WAF, Shield, Route53 (restrict regions)
Know normal behaviour - GuardDuty, CloudWatch

Intrusion detection (IDS) and prevention (IPS) systems

IPS

Tries to prevent exploits by scanning and analyzing content behind firewall for threats.
Usually installed as an agent on hosts.

IDS

Watches the network and systems for suspicious activity.
Logs get collected and analyzed in a Security Information and Event Management (SIEM) system

CloudWatch vs CloudTrail

CloudWatch	CloudTrail
Logs events in AWS services	Logs API Activity
High-level monitoring and eventing	Low-level granularity
Log from multiple accounts	Log from multiple accounts
Logs stored indefinitely	Logs stored in S3 or CloudWatch
Alarm history for 14 days	No native alarming

Service Catalog

A framework allowing administrators to organize, govern and distribute application stacks or products

In Multi-account scenario you can share portfolio between accounts, and keep catalogs in-sync with inherited constraints. Local admins can also push local portfolios and update constraints. The IAM users, groups and roles are NOT inherited. Local admin needs to add local IAM resources to the portfolio. By default, when we import a portfolio, the launch role is inherited from the shared portfolio so by default resources get created in the parent account.

Granular control over which users have access to which offerings
Makes use of adopted IAM roles so users don’t require direct access to underlying services
Based on CloudFormation templates
Admins can version or remove products, not affecting existing deployments
TagOption library is a good way to enforce tagging strategy

Launch constraint

IAM role that Service Catalog assumes when launching a product. Without this constraint, user would require access to all underlying AWS resources.

Notification constraint

Specifies the SNS topic to receive notifications about stack events and failures.

Template constraint

Adjust product attributes based on choices a user makes. For example, allow only specific instance types in dev environment.

Federated Identity Providers

SAML 2.0

Can handle both authentication and authorization
XML-based
Provides user, group, membership and other info
Good for Single Sign-on for enterprise users

OAuth

Handles only authorization
Delegate access by means of token
Allow apps to act on behalf of a user
Best for API authorization between apps

OpenID

Identity layer on top of OAuth, adding authentication
REST/JSON based
Single Sign-on for public customers

Multi-Account

Required for segregation of duties, cost allocation and increased agility.

Use-cases:

Administrative isolation between workloads
Limited visibility and discoverability of the workloads
Minimisation of the “blast radius”
Isolation of recovery and auditing data

Organizations

Manage policies across accounts
Automate creation of new accounts
Group accounts in Organizational Units (OU)
Consolidated billing

Service Control Policies (SCP)

Used to restrict access to specific AWS services (DENY)
Cascade to sub-accounts

Account types

Publishing
Identity
Logging

Directory services

AWS Cloud Directory

Cloud-native directory solution

Cloud applications that need hierarchical data with complex relationships

Cognito

A solution providing access control and authentication. Also known as Token Vending Machine.

Best for developing consumer facing apps or SaaS
Supports MFA
Data at-rest and in-transit encryption
Log in via social identity providers (Federation)
Support for SAML

AWS Directory Service for Microsoft Active Directory

AWS-managed full Microsoft AD (standard or enterprise) on Windows Server 2012 R2

Suits enterprises that want a hosted Microsoft AD or need LDAP for Linux apps

AD Connector

Integrate with on-premise Active Directory. It also allows EC2 instances to join AD domain

Must have an existing AD
Existing AD users can access AWS assets via IAM roles
Supports MFA via existing RADIUS-based MFA infrastructure

Simple AD

Low-scale and low-cost AD implementation based on Samba

Supports user accounts, groups, group policies and domains
Kerberos-based SSO
No MFA
No Trust Relationships

Access Management

Security Token Service (STS)

Provides temporary credential access. Can use various Identity Providers to authenticate requests

Secrets Manager

Stores passwords, encryption, API, SSH, PGP keys.

Better than hard-coding credentials in the app
Fine-grained access control using IAM
Offers automatic password rotation for RDS and Aurora

Encryption

KMS

Used for encryption at rest.

Multi-tenant
Root of trust is managed by AWS
Tightly integrated in Lambda, S3, EBS, EFS, DynamoDB, SQS and many more
Allows to import your own keys
Control access to the keys using IAM users and roles
Audit using CloudTrail
PCI DSS, FIPS 140-2 compliant

CloudHSM

Dedicated hardware device within a VPC that can offload SSL from a web server or act as a CA.

Single-tenant
Customer managed root of trust
Broad support by 3rd party

Classic CloudHSM

Based on safeNet Luna SA
Requires $5000 upfront cost
Single device
FIPS 140-2 Level 2

Current CloudHSM

Proprietary device
No upfront costs
HA
FIPS 140-2 Level 3

Certificate Manager

A service to provision manage and deploy SSL certificates.

Directly integrated in CloudFront, ELB and API Gateway
Allow importing 3rd party certificates
Supports wildcard domains
Manages certificate renewal
Also supports Private Certificate Authority for internal apps

Migrations

Migration strategies

Comparison of the strategies:

Name	Time and costs	Opportunity to optimize	Example
Re-Host (Lift and Shift)	* *	*	Move on-premise MySQL database to an EC2 instance
Re-Platform (Lift and Reshape)	* * * *	* * *	Move on-premise MySQL database to RDS MySQL
Re-Purchase (Drop and Shop)	* * *	*	Abandon a legacy system and move to a 3rd party
Rearchitecture	* * * * *	* * * * *	Replace legacy app with a serverless function
Retire			Get rid of the old application
Retain	*		Do nothing

Cloud Adoption Framework

An alternative to The Open Group Architectural Framework (TOGAF) which was developed in 1995

Key aspects:

Business

Create a strong business case
Measure the benefits (TCO, ROI)

People

Reevaluate roles and structures, skills and process that need to fill the gaps
Align motivations and career management with evolving roles
Training options

Governance

Portfolio management should help to simplify migration
Align KPI with newly established business capabilities

Platform

Standardization
Architectural patterns tailor to cloud-native approach
Develop new skills to leverage the platform

Security

Change in Identity and Access management
Logging and audit capabilities evolve
Shared-responsibility model removes some facets and adds other facets

Operations

Monitoring
Measure and adjust performance
Disaster recovery takes new methods

Migration Hub

Storage migration

Storage Gateway
Snowball
To migrate data from RAID 10 storage - AWS CLI is the only good option

Server Migration Service

Automates migration of on-premise VMware vSphere or Microsoft Hyper-V machines to AWS (Windows and Linux VMs only)

Can sync Volumes and take regular AMI snapshots
Good for disaster recovery

Database Migration Service (DMS)

Along with Schema Conversion Tool (SCT) helps to migrate DBs to RDS or EC2

Informix DB is not supported

DMS

Used for smaller, simpler conversion
Supports MongoDB and DynamoDB
Has replication function for on-premise DB to AWS DB, Snowball or S3

SCT

Suits for larger, complex databases like data-warehouses
Can convert schemas for migration on the same DB, or a different DB (from Oracle to Aurora)

Application Discovery Service

Gathers information about the on-premise data center to help in migration planning.

Can run agent-less in a VMware setup
Requires an agent otherwise
Collect config, usage and behaviour data

Network migration process

Most organisations start with a VPN connection
As usage grows - you switch to Direct Connect and keep VPN as a backup
Transition to VPN to Direct Connect using BGP weighting and static routes

Deployment and Operations

Deployment strategies

Name	Deployment Time	Downtime	Rollback	Example
All at once	*	Yes	Manual	Deploy to the same instances
Rolling	* *	-	Manual	Deploy to the same instances one by one
Rolling with extra batch	* * *	-	Manual	Launch new instances with new version before removing the old one
Immutable	* * * *	-	Kill new nodes	Launch a full set of instances with new version
Traffic splitting	* * * *	-	Reroute DNS	Percent of traffic routed to new “canary” instance
Blue-Green	* * * *	-	Swap URL	DNS entry is changed when a new version is fully up

Blue-green methods

Update Route53 record to point to a new ELB or instance
Swap autoscaling group behind the ELB
Change environment URL in Beanstalk
Clone stack in OpsWorks and update DNS

Blue-green contraindications (anti-patters)

Data store schema is tightly coupled with the code
Upgrade requires special operations during the deployment
Third-party products might not be blue-green friendly

Beanstalk

Orchestration service to deploy applications with a single click

Supports Docker, PHP, Java, Node.js, etc.
Multiple environments within the application (QA, DEV, PROD, etc.)
Allows swapping Environment URLs to do blue-green deployment
Can be used to create Web-server or Worker environments
Not for short tasks

CloudFormation

Infrastructure as Code

Main Components

Templates - JSON or YAML file with instructions for building the environment
Stacks - entire environment described by the template as a unit
Change Set - a summary of the proposed changes to the Stack
Stack policies - deletion/update protection for resource

Stack policies cannot be removed and only updated though CLI

AWS Config

Allows to audit, assess and evaluate configurations of AWS recourses

Create a baseline configurations and tracks deviations
Config Rules check resources for certain conditions, and flags deviations as “non-compliant”

OpsWorks

A manged instance of Chef or Puppet to deploy code, automate tasks, configure instances, perform upgrades, etc.

Global service

OpsWorks Stacks

OpsWorks Stacks uses an embedded Chef solo client on EC2 instances to run Chef recipes

Stacks are collections of resources needed to support a service application
Layers are different components of the application
EC2 instances, RDS instances, Load balancers are examples of Layers
Stacks are regional

AWS System Manager

Centralized console for system management tasks

Designed to manage large fleets of systems
SSM agent supports OSs supported by AWS
SSM agent is available by default in most AMIs
Can be installed on-premise

Services

Inventory - Collect OS, application and instance metadata
State Manager - Specify groups of machines with the same configuration
Logging - Stream logs from instances to CloudWatch
Parameter Store - Shared secure storage for sensitive data
Resource Groups - Group resources by tagging
Maintenance Windows - Define schedules for instances to apply patches and install updates
Automation - Run routine maintenance tasks and scripts
Run Command - Run a one-off command on any machine
Patch Manager - Automates the process of applying patches

System Manager (SSM) Documents are JSON/YAML files that specify the tasks that SSM performs

Command document - Holds command to execute
Policy document - Defines conditions that put an instance in a given state for State Manager
Automation document - A list of tasks for Automation service

Cost optimisation

Capital Expenses (CapEx) - money spent on a long-term assets Operational Expenses (OpEx) - a variable expense that business pays to keep running Total Cost of Ownership (TCO) - a comprehensive look of all related expenses, both hard and soft
Return Of Investment (ROI) - an amount of money we expect to receive back in a given timeframe

Cost Optimisation Strategies

Appropriate provisioning

provision resources you need and nothing more
Consolidate for possible greater density and lower complexity
CloudWatch to watch utilisation

Right sizing

Use lowest-cost resources that still meet the requirements
Architect for most consistent use of resources to avoid usage spikes
Loose coupling helps to scale components independently

Purchase options

Use Reserved Instances for permanent applications
Spot instances for temporary horizontal scaling
EC2 Fleets to define a target mix of RI, On-Demand and Spot instances

Geographical Selection

AWS pricing varies from region to region
Place some resources in a remote region if local access is not required
Route53 and Cloudfront reduce potential latency issues

Managed services

Leverage RDS, Redshift, Fargate or EMR to drive TCO down

Optimized data transfer

Data going out and cross-region can be a significant cost component
DirectConnect is sometimes a cost-effective option depending on the volume and speed

Reserved instances

Agreement to purchase usage of EC2 instances in advance for a discount over On-Demand prices

Provides reserved capacity when used with a specific AZ
AWS Billing automatically applies the discount rates when you launch an instance that matches RI agreement
Can be shared across multiple accounts within consolidated billing
You can try to sell standard RI on the RI Marketplace

RI types comparison

	Standard	Convertible
Terms	1 or 3 years	1 or 3 years
Discount	40% - 60%	31% - 54%
Change AZ	Yes	Yes
Change Instance Size	Yes	Yes
Change Network Type	Yes	Yes
Change instance family	No	Yes
Change OS	No	Yes
Change Tenancy	No	Yes
Change Payment option	No	Yes
Uses Price Reduction	No	Yes
Sell on RI Marketplace	Yes	Soon

Attributes

Instance type - CPU, Memory, Network capability
Platform - Linux, SUSE, RHEL, Windows, SQL Server
Tenancy - Default or dedicated
AZ (optional) - If AZ is not specified, there is no reservation created, and the discount applies to any instance in the family in any AZ in the region

Spot instances

Excess EC2 capacity that AWS sells on the market exchange basis

Customer defines the highest price willing to pay for an instance
If there are not enough instances and others are willing to pay more - your instance is terminated

Spot instance types

One time only - ephemeral data on the node will be lost
Maintain - configurable to Terminate, Stop or Hibernate until price point met again
Duration based

Budgets

Allows setting predefined limits and notification for exceeding the budget
Based on Cost Usage, Reserved Instance Utilisation or RI Coverage
Useful to distribute cost and usage awareness

Consolidated billing

Single account with restricted access is a Payer
More benefits from the Economies of scale

Trusted Advisor

Runs a series of checks on your resources and proposes improvements
Can help optimize scaling or reserved capacities
Core checks are available for everyone
Full list of checks is only for Business and Enterprise support plans

AWS Solutions Architect Cheatsheet

Compute

EC2

Autoscaling

Batch

ECS

EKS

Lambda

Glue

Elastic MapReduce (EMR)

AI and Machine Learning

Storage

Persistent Data stores

S3

Security in S3

Storage classes

Lifecycle Policies

Amazon Athena

EBS Volumes

EFS

Storage Gateway

Transient Data stores

SQS

SNS

Kinesis

Snowball

Snowball Edge

Snowmobile

Amazon MQ

Ephemeral Data stores

EC2 Instance store

ElastiCache

Databases

RDS

DynamoDB

Redshift

Aurora

Quantum Ledger Database (QLDB)

Timestream

DocumentDB

Elasticsearch (ES)

EMR

Choosing the right option

Networks

VPC

Network ACL

Security groups

Peering

PrivateLink

VPC Endpoint

VPN

AWS Managed VPN

DirectConnect (DX)

AWS CloudHub

Software VPN

Transit VPC

Internet access

Internet Gateway

Egress-Only Gateway

NAT Instance

NAT Gateway

Placement Groups

Clustered

Spread

Partition

ELB

Classic Load Balancer - ELB

Network Load Balancer - NLB

Application Load Balancer - ALB

Route53

CloudFront

API Gateway

Security

Well-known concepts

DDoS protection

Intrusion detection (IDS) and prevention (IPS) systems

Service Catalog

Federated Identity Providers

SAML 2.0

OAuth