These are the notes I made while preparing for AWS Solutions Architect Professional certification.

Compute

EC2

Autoscaling

  • Triggered by event or scaling action
  • Requires a launch configuration
  • Launch configurations are immutable, you need to create a new one for every change

Scaling options

  • Maintain - keep a minimum number of instances running
  • Manual - specify maximum, minimum or a specific number
  • Schedule - increase or decrease number of instances based on schedule
  • Dynamic - scale based on real-time metrics

Scaling policies

  • Target tracking - metric in relation to target value
  • Step scaling - adjust capacity given certain thresholds (Has warm-up period)
  • Simple scaling - wait util health-check and cool-down period expires

Disaster Recovery

  • AMI snapshots can be copied to another AZ for resiliency
  • Reserved instances is the only way to guarantee that the resource will be available when needed
  • ELBs and Route53 offer health-check feature for self-healing

Batch

A management tool for reoccurring batch tasks, like rotating logs on Firewall appliances

  1. Create a compute environment
  2. Specify a Job Queue with priority
  3. Define a Job
  4. Schedule the Job

ECS

AWS-specific platform that supports Docker containers

  • Considered easier to use, but limited
  • Relies heavily on AWS services like Route53, ALB, CloudWatch
  • Containers run isolated and are grouped in “Tasks”

EKS

A platform, fully compatible with Kubernetes

  • Considered more complex and feature-rich and extensible
  • Handles many things internally
  • Containers have access to each other within a “Pod”

Lambda

  • Supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby code
  • Stateless
  • Triggered by SNS, SQS, S3 events, DynamoDB streams, API Gateway or CloudFront requests

Serverless application model

Open-source framework for building serverless apps on AWS

  • Uses YAML as configuration language
  • Includes a cli tool to manage serverless infra via CloudFormation
  • You can test Lambda functions locally via Docker-based emulator

EventBridge

An event-bus service that links AWS with 3rd party applications

Step Functions

  • Managed workflow and orchestration platform
  • Good for order processing workflow
  • Defines apps as state machines
  • Create tasks, sequential steps, branching paths and timers
  • Uses Amazon State Language (JSON)

Glue

Service to build event-driven ETL pipelines

  • Supports Scala and Python

Elastic MapReduce (EMR)

Managed Hadoop framework for data processing

  • Supports Apache Spark, HBase, Presto and Flink

  • Good for log analysis and ETLs

  • Steps are units of work in EMR

  • Cluster gets deployed on EC2 instances

  • Master nodes

  • Core nodes store data on HDFS

  • Task nodes are ephemeral

AI and Machine Learning

  • SageMaker - A framework to build custom ML models
  • Greengrass - IoT solution to perform ML inferences locally on devices
  • Comprehend - Natural Language Processing (sentiment Analysis)
  • Forecast - Give predictions on time-series data
  • Lex - Conversational interface (like Alexa)
  • Personalise - Recommendation engine
  • Polly - Text to speech
  • Rekognition - Image/Video processing (recognize objects, people, activities)
  • Textract - OCR engine to extract text from scanned documents
  • Transcribe - Speech-To-Text
  • Translate - Language translation

Storage

Persistent Data stores

S3

Use-cases:

  • User-generated content, video- and photo-sharing
  • Static websites
  • Recent backups
  • Log storage
  • Data lake (Athena, Redshift Spectrum, Quicksight)
  • IoT Streaming Data Storage (Kinesis Firehose)
  • ML and AI Storage (Rekognition, Lex, MXNet)

Properties:

  • 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD
  • Concurrent reads and writes
  • Maximum object size is 5 TB
  • The largest object in a single PUT is 5 GB
  • Recommended to use multi-part uploads if larger than 100 MB (improved throughput and recovery from network issues)
  • Cross-Region Replication (does NOT replicate existing objects)

Good-to-know:

  • Since 2019 supports strong read-after-write consistency
  • Hexadecimal prefixes in a bucket can help parallelize requests
  • S3 Management Analytics to get reports and optimize-costs
  • Transfer Acceleration which is like a CloudFront in revers to speed up uploads
  • “Requester Pays” buckets to make available large datasets and don’t pay for data transfer
  • Tags are vital for billing
  • Can trigger SQS, SNS or Lambda based on the bucket event
  • Static Web hosting
  • S3 supports BitTorrent protocol

Security in S3

Access

  • Resource based
    • Object ACLs
    • Bucket policies
  • User based (IAM Policy)
  • MFA before Delete (or Version state change)
  • Access logging

Encryption

  • SSE-S3 - Use existing S3 AES-256 key
  • SSE-C - Upload an own AES-256 key
  • SSE-KMS - Generate a KMS key and use it.
  • Client-side - Encrypt before uploading

Storage classes

S3 Standard

  • Default tier
  • 11-9’s durability, 99.99% availability
  • Availability SLA is 99,9%

S3 Standard IA

IA - Infrequently Accessed. Cheaper than Standard, but you are charged a retrieval fee.

  • Same properties as Standard
  • Minimum billable object size is 128 KB

S3 One Zone-IA

Same as Standard IA, but replicated only in a single AZ

  • 11-9’s durability, 99.5% availability

S3 Intelligent Tiering

  • Good for unknown data access patterns
  • Moves data to the most cost-effective access tier

S3 Glacier

This is the service used by the AWS Storage Gateway Virtual Tape Library under the hood. It is integrated to S3, but has its own API and concepts. Objects stored in Glacier via S3 API are not accessible vie Glacier own API.

  • Used for archival only or “Cold storage”
  • Very cheap
  • 3-5 hours to retrieve the data
  • Content of the archive is immutable
  • Minimal storage duration: 90 days
  • Integrated with AWS CloudTrail for audit

Concepts

  • Glacier Vault = S3 Bucket
  • Glacier Achieve = S3 Object (Max size: 40TB, Immutable)
  • Glacier Policy defines what rules the Vault follows
  • Vault Lock enforces the policy (After creating it, you have 24 hours to abort operation)
  • Access is provided via IAM

S3 Glacier Deep Archive

The lowest-cost storage class

  • Minimal storage duration: 180 days

Life-cycle Policies

  • All calculations use the object upload date

Amazon Athena

SQL Engine on top of S3 using Presto

To boost performance the data should be stored in Parquet format. It is close to Redshift Spectrum functionality.

  • Athena - for data that is stored in S3 only
  • Spectrum - to join data in S3 with Redshift tables

Supported formats:

  • Apache ORC
  • Apache Web Logs
  • CSV
  • TSV
  • Text File with Custom Delimiters
  • JSON
  • Parquet

EBS Volumes

These are virtual hard drives that can only be used with EC2. EBS volumes are tied to a single AZ and can be changed while attached to the instance.

Types:

  • General Purpose SSD (gp2) - burstable IOPS storage for a broad range of workloads
  • New General Purpose SSD (gp3) - cheaper option with higher latencies
  • Provisioned IOPS SSD (io1) - option for databases, or I/O-intensive workloads
  • Throughput Optimized HDD (st1) - throughput-intensive workloads and large files (streaming, big data, logs)
  • Cold HDD (sc1) - low cost volume for large cold datasets

Snapshots

For EBS snapshots, you are charged only for storage actually used. Snapshots are replicated within a single AZ.

Use-cases:

  • Cost-effective and easy backup strategy
  • Migrate EC2 instances between AZs
  • Share datasets with other users/accounts
  • Convert an unencrypted volume to an encrypted one
  • Support RAID configurations

Snapshot Lifecycle Policy

  • Scheduled snapshots are done via AWS Ops Automator
  • Retention rules to remove stale snapshots

RAID configurations

RAID 0 RAID 1 RAID 5 RAID 6
Redundancy None 1 down 1 down 2 down
Reads * * * * * * * * * * * * * * *
Writes * * * * * * * * * *
Capacity 100% 50% (n-1)/n (n-2)/n

EFS

  • Good for multi-attach, file based solutions
  • Implements NFS 4 and 4.1 file share
  • Pay for what you use billing model
  • Multi AZ metadata and data storage
  • Goes along well with AWS DataSync
  • For write-heavy workloads EFS is
    • 3 times more expensive than EBS
    • 20 more expensive than S3
  • Burstable with a baseline rate of 50 MiB/s and burst rate of 100 MiB/s.

General Purpose performance mode is limited to 7,000 IOPS per file system.

Max I/O performance mode allows higher throughput sacrificing latency

Storage Gateway

Provides local storage resources backed by AWS S3. Often used for disaster recovery and cloud migrations.

  • Doesn’t have an SLA
  • File Gateway - An interface to S3 via NFS or SMB protocol
  • Volume Gateway - iSCSI protocol
    • Stored Volume - low-latency access to your entire dataset (Good for disaster recovery use-case)
    • Cached Volume - copy of frequently accessed data locally
  • Tape gateway - Media tape library to use with existing backup software

Transient Data stores

SQS

  • First AWS service
  • Integration with KMS for encrypted messaging
  • Default storage - 4 days, max - 14 days
  • Optionally supports FIFO
  • Message is 256 KB (Up to 2 GB via AWS Java SDK)

Use-cases:

  • Pull-based interaction
  • Persistent task storage
  • Controlled completion
  • Example is image resize process

SNS

  • Enables pub/sub design pattern
  • Push-based interaction
  • Bulk notification
  • Mobile pushes
  • Suitable for “Fan Out” use-case

Supported protocols

  • HTTP/S
  • E-mail (plaintext or JSON)
  • SMS
  • SQS
  • Mobile App
  • Lambda
  • Firehose

Kinesis

Collection of services designed around stream data processing

  • Gigabytes of data from thousands of sources
  • Real-time
  • Stores data for 24 hours (configurable up to 7 days)
  • One shard ingests up to 1000 records/sec
  • Default limit is 500 shards
  • A record consists of:
    • Partition key
    • Sequence number
    • Data blob (up to 1MB)
  • Use Kinesis Client Library (KCL) for optimal integration

Snowball

Move large amounts of data into and out of AWS using physical appliances of 80 TB or 50 TB(only US regions)

Use-cases:

  • If loading your data over the Internet would take a week or more
  • Cloud migration
  • Disaster recovery
  • Datacenter decommission

Snowball Edge

Same as Snowball but with Lambda and clustering onboard

Snowmobile

Loads up to 100PB of data on a truck and transports it to AWS datacenter

Amazon MQ

Managed implementation of Apache ActiveMQ

  • Good candidate for a “Lift and Shift” migration
  • Supports JMS, NMS, MQTT and WebSockets
  • For new applications it’s better to use SQS

Ephemeral Data stores

EC2 Instance store

Storage attached to an EC2 instance directly that gives you better performance.

Use-cases:

  • Cache
  • Buffers
  • Work areas

i2 (SSD) and d2 (HDD) instance types are storage-optimised.

ElastiCache

Redis

  • Web session storage
  • Leaderboard
  • Encryption
  • Clustering
  • Pub/sub
  • Complex data types (like geospatial indexes)

Memcached

  • Caching proxy in front of RDS
  • Caching whole responses or objects
  • Simple, easy to scale out and in solution

Databases

ACID

  • Atomic
  • Consistent
  • Isolated
  • Durable

BASE

  • Basically available
  • Soft state
  • Eventually consistent

RDS

  • Up to 5 read replicas
  • Sync replication within the Region, Async - cross Region

Supported engines

  • PostgreSQL
  • MySQL
  • MariaDB
  • Oracle ( License Included or Bring-Your-Own-License )
  • Microsoft SQL Server

Anti-patterns

  • For large binary files - use S3
  • For an autoscaling - use DynamoDB
  • For key-value data or unstructured data - use DynamoDB
  • For DB2 or SAP HANA - use EC2
  • For complete control over the DB - use EC2

DynamoDB

Multi-AZ NoSQL data storage with priced based on throughput. Partition Key and Sort Key forms a Primary key. Secondary indexes can be Global or Local. Local Secondary indexes use the same partition key, but different sort key. You can even create table replicas using global secondary index and balance the load.

  • For timeseries data it’s better to create a new table per time period to balacnce

Capacity options:

  • Provisioned capacity units (CU)

  • Autoscaling rules

  • On-Demand (least cost-effective)

  • Great for unstructured data and click-stream data

  • Global Tables offer multi-region redundancy

  • To avoid peak throughout it needs a queue in front

  • Has streams for replication across regions

  • Can offer strong consistency and ACID using DynamoDB Transactions

DynamoDB Accelerator

  • Read-through cache
  • Write-through cache

Accelerator better suits read-intensive workloads rather than write intensive.

Calculating partition number

  • By capacity: (Total Read CU / 3000) + (Total Write CU / 1000)
  • By size: Total Size / 10 GB
  • Total Partitions: Ceil(Max(By capacity, By size))

Redshift

A database designed for static analysis of data, PostgreSQL compatible.

  • Not for real-time data ingestion
  • Needs enhanced VPC Routing to be accessible within the VPC
  • Redshift Spectrum allows querying S3 directly

Aurora

  • Supports fast schema changes (DDL)
  • Scales in increments of 10 GB up to 64 TB
  • Up to 15 Read replicas
  • Single AWS Region
  • Replicates entire database

Quantum Ledger Database (QLDB)

A blockchain database. It is an immutable journal with append-only semantics. Centralized design allows better performance compared to common blockchain frameworks.

Amazon Managed Blockchain offers Hyperledger Fabric and Ethereum blockchain frameworks and uses QLDB internally

Timestream

A timeseries database, alternative to DynamoDB or Redshift. It includes specific features like interpolation and smoothing. It plays well for telemetry data and sensor measurements.

DocumentDB

A fully managed (HA, multi-AZ, KMS encrypted, backed up to S3) MongoDB compatible solution

Elasticsearch (ES)

In AWS context it is mostly a search and analytical tool, but it can also store documents. AWS allows replacing Logstash in ELK stack with CloudWatch, Firehose or Greengrass to build solutions for analytics.

EMR

Managed Spark and Hadoop

Choosing the right option

Option Use-case
Database on EC2 Full control over the DB or the engine is not available on RDS
RDS Store well-formed and structured data for OLTP workloads
DynamoDB Key-value store for unpredictable data types and high performance
Redshift For massive amounts of data and OLAP workloads
Neptune Store relationships between objects and graph data
Elasticache For highly volatile data as a fast temporary storage

From fault tolerance perspective the preferred options are:

  1. DynamoDB
  2. Aurora
  3. Multi-AZ RDS
  4. Database on EC2

Networks

VPC

  • The largest CIDR range allowed in a VPC is /16 - 65536 addresses.
  • Minimal CIDR range is /28 - 16 addresses (Effectively 11 available)
  • DHCP option sets allow you to configure custom DNS and NTP servers
  • Create subnets in different AZs to make VPC multi-AZ

Reserved IP addresses in a VPC:

  • 10.0.0.0: Network address
  • 10.0.0.1: VPC router
  • 10.0.0.2: DNS server
  • 10.0.0.3: Reserved by AWS for future use
  • 10.0.0.255: Network broadcast address. Broadcast in a VPC is not supported by AWS.

Network ACL

Additional layer of security for VPC

  • Applied to entire subnets, not individual resources
  • Allows all inbound and outbound traffic by default
  • Stateless - no connection tracking
  • To establish most TCP connections NACL should allow outbound ephemeral ports
  • Extra layer of protection in addition to security groups

Security groups

Virtual firewalls for individual assets (EC2, RDS, AWS Workspaces, etc)

Controls protocols and port ranges.

Rules are specified by:

  • Source or Destination IP
  • Subnet
  • Security group

Peering

Connectivity between two VPC provided by AWS

  • The traffic stays in AWS Network
  • Transitive peering is not supported (A -> B, B-> C, A !> C)
  • Could be established across AWS accounts

Connectivity between VPCs or AWS services using interface endpoints

  • Reach other service in private network via AWS backbone
  • More granular than VPC peering, it exposes only endpoints, not networks
  • Unidirectional communication
  • Preferred option for shared services

VPC Endpoint

By default, IAM users do not have permission to work with endpoints, you need a custom IAM policy.

Interface endpoint

Elastic Network Interface with a Private IP inside your VPC

  • Uses DNS entries to redirect traffic
  • Secured by Security Groups

Example AWS products: API gateway, CloudFormation, CloudWatch

Gateway Endpoint

A gateway that is a target for a specific route

  • Uses prefix lists in the route table
  • VPC Endpoint policies (Similar to IAM Policies)

Example AWS products: Amazon S3, DynamoDB

VPN

AWS Managed VPN

IPsec VPN connection over your existing network

Use-cases:

  • Quick and easy way to get a secure tunnel into a VPC
  • Redundant link for DirectConnect or other VPC VPN

Pros:

  • Static routes support
  • BGP peering and routing

Cons:

  • Depends on the internet connection

DirectConnect (DX)

A dedicated network connection over AWS private lines

Use-cases:

  • High throughput connection to AWS
  • Consistent and predictable network bandwidth

Pros:

  • Speed up to 10Gbps
  • Potential bandwidth cost reduction
  • Traffic is secured from internet access

Cons:

  • May require additional actions from the hosting provider
  • Not Highly Available
  • Requires 802.1Q VLAN support and BGP routing

AWS CloudHub

Connect locations in a Hub and Spoke way using Private Gateways.

Use-cases:

  • Connect multiple offices to access AWS and each other

Pros:

  • Reuse existing Internet connection
  • Supports BGP routes

Cons:

  • No redundancy
  • Depends on the internet connection

Software VPN

Do everything yourself. Downloading from AWS Marketplace is an option.

Use-cases:

  • VPN option is not supported by AWS
  • You have to manage both VPN endpoints for compliance reasons

Pros:

  • Full control over the setup

Cons:

  • You need to ensure redundancy in the whole chain

Transit VPC

Create a global networking transit center

  • One VPC that is a pass-through
  • Hybrid-deployments for a multi-cloud solutions

Internet access

Internet Gateway

No availability risks, no bandwidth constraints, supports IPv4 and IPv6

Use-cases:

  • Provide a route table target for Internet-bound traffic
  • Performs NAT for instance with Public IPs

Egress-Only Gateway

Use it instead of NAT instance for IPv6 communications

NAT Instance

An EC2 instance from an AWS-provided AMI

  • Bandwidth depends on the instance type
  • Public IP can be detached
  • Can apply security groups
  • Can be used as Bastion server
  • Costs less than NAT Gateway for a very small installations

NAT Gateway

Fully managed NAT service that replaces NAT Instance on EC2

  • Should be deployed in a public subnet
  • Uses an Elastic IP for public IP, that cannot be detached
  • Multi-AZ availability
  • Bandwidth up to 45 Gbps
  • No Security groups
  • Supports IPv4 only

Placement Groups

Clustered

Grouping instances physically on the same rack or hardware for low-latency communication

  • Enhanced networking throughput
  • Finite capacity, better to provision instances up-front

Spread

Instances spread across separate hardware for better fault tolerance

  • Multi-AZ deployment
  • Maximum of 7 instances running per group, per AZ

Partition

Spread instance groups across hardware

  • Designed for large multi-instance applications
  • Does not support dedicated hosts

ELB

  • Two-way traffic
  • Immediate request handling

Classic Load Balancer - ELB

Legacy technology. You should only use it if you depend on classic EC2 (first AWS iteration on EC2 service)

Network Load Balancer - NLB

Very scalable and performant.

  • Can handle huge spikes in traffic (10s of millions rps)
  • HTTP pass-through (does not terminate SSL)
  • Static IP address per AZ

Application Load Balancer - ALB

Backed by an EC2 instance, so it’s less performant but more flexible

  • Advanced request routing
  • Does not support Elastic IPs

With Global Accelerator you can get a static IP

Route53

Routing policies:

  • Simple - just DNS record
  • Failover - uses health-checks to failover to a backup record (2 records)
  • Geolocation - returns a record to a resource close to your region
  • Geoproximity - routes to a closes AWS region
  • Latency - compares the latency from caller to multiple resources, returns the one with lowest
  • Multivalue Answer - Multiple IPs, a basic load balancer
  • Weighted - distribute traffic percentage based on weights

It is possible to use Route53 to manage domains that aren’t registered at AWS.

Ensure that there is a default route configured for Geolocation policy.

CloudFront

  • Global CDN
  • Can be configured as a trusted signer to limit access (Signed cookie)
  • Origin Access Identity
  • Offers access logs for both Web and RTMP distributions
  • Reserved capacity plans have discounts if you commit to a minimum monthly usage
  • Without SNI CloudFront cost extra 600$/month to spin up a dedicated IPv4 in every edge location.
  • Can configure Geo-restrictions

API Gateway

Managed, highly available service for REST APIs

  • Can be a proxy in front of Lambda, AWS service or any HTTP API
  • Based regionally, with an option to optimize edge delivery via CloudFront
  • Possible to keep it private, or even publish on the AWS marketplace
  • Supports API keys and Usage Plans for user identification, throttling or quotas

Security

AWS Artifact contains lot’s documents for various compliance certifications

Permission control tools:

  • Service Control Policies
  • Permission boundaries
  • IAM permission policies
  • Scoped-down policies
  • Resource-based policies
  • Endpoint policies

Well-known concepts

DDoS protection

Always have a plan.

  • Minimize attack surface - NACLs, SGs,VPC design
  • Scale and absorb - Auto-scaling groups, CloudFront, Static content from S3
  • Safeguard exposed resources - WAF, Shield, Route53 (restrict regions)
  • Know normal behaviour - GuardDuty, CloudWatch

Intrusion detection (IDS) and prevention (IPS) systems

IPS

  • Tries to prevent exploits by scanning and analyzing content behind firewall for threats.
  • Usually installed as an agent on hosts.

IDS

  • Watches the network and systems for suspicious activity.
  • Logs get collected and analyzed in a Security Information and Event Management (SIEM) system

CloudWatch vs CloudTrail

CloudWatch CloudTrail
Logs events in AWS services Logs API Activity
High-level monitoring and eventing Low-level granularity
Log from multiple accounts Log from multiple accounts
Logs stored indefinitely Logs stored in S3 or CloudWatch
Alarm history for 14 days No native alarming

Service Catalog

A framework allowing administrators to organize, govern and distribute application stacks or products

In Multi-account scenario you can share portfolio between accounts, and keep catalogs in-sync with inherited constraints. Local admins can also push local portfolios and update constraints. The IAM users, groups and roles are NOT inherited. Local admin needs to add local IAM resources to the portfolio. By default, when we import a portfolio, the launch role is inherited from the shared portfolio so by default resources get created in the parent account.

  • Granular control over which users have access to which offerings
  • Makes use of adopted IAM roles so users don’t require direct access to underlying services
  • Based on CloudFormation templates
  • Admins can version or remove products, not affecting existing deployments
  • TagOption library is a good way to enforce tagging strategy

Launch constraint

IAM role that Service Catalog assumes when launching a product. Without this constraint, user would require access to all underlying AWS resources.

Notification constraint

Specifies the SNS topic to receive notifications about stack events and failures.

Template constraint

Adjust product attributes based on choices a user makes. For example, allow only specific instance types in dev environment.

Federated Identity Providers

SAML 2.0

  • Can handle both authentication and authorization
  • XML-based
  • Provides user, group, membership and other info
  • Good for Single Sign-on for enterprise users

OAuth

  • Handles only authorization
  • Delegate access by means of token
  • Allow apps to act on behalf of a user
  • Best for API authorization between apps

OpenID

  • Identity layer on top of OAuth, adding authentication
  • REST/JSON based
  • Single Sign-on for public customers

Multi-Account

Required for segregation of duties, cost allocation and increased agility.

Use-cases:

  • Administrative isolation between workloads
  • Limited visibility and discoverability of the workloads
  • Minimisation of the “blast radius”
  • Isolation of recovery and auditing data

Organizations

  • Manage policies across accounts
  • Automate creation of new accounts
  • Group accounts in Organizational Units (OU)
  • Consolidated billing

Service Control Policies (SCP)

  • Used to restrict access to specific AWS services (DENY)
  • Cascade to sub-accounts

Account types

  • Publishing
  • Identity
  • Logging

Directory services

AWS Cloud Directory

Cloud-native directory solution

  • Cloud applications that need hierarchical data with complex relationships

Cognito

A solution providing access control and authentication. Also known as Token Vending Machine.

  • Best for developing consumer facing apps or SaaS
  • Supports MFA
  • Data at-rest and in-transit encryption
  • Log in via social identity providers (Federation)
  • Support for SAML

AWS Directory Service for Microsoft Active Directory

AWS-managed full Microsoft AD (standard or enterprise) on Windows Server 2012 R2

  • Suits enterprises that want a hosted Microsoft AD or need LDAP for Linux apps

AD Connector

Integrate with on-premise Active Directory. It also allows EC2 instances to join AD domain

  • Must have an existing AD
  • Existing AD users can access AWS assets via IAM roles
  • Supports MFA via existing RADIUS-based MFA infrastructure

Simple AD

Low-scale and low-cost AD implementation based on Samba

  • Supports user accounts, groups, group policies and domains
  • Kerberos-based SSO
  • No MFA
  • No Trust Relationships

Access Management

Security Token Service (STS)

Provides temporary credential access. Can use various Identity Providers to authenticate requests

Secrets Manager

Stores passwords, encryption, API, SSH, PGP keys.

  • Better than hard-coding credentials in the app
  • Fine-grained access control using IAM
  • Offers automatic password rotation for RDS and Aurora

Encryption

KMS

Used for encryption at rest.

  • Multi-tenant
  • Root of trust is managed by AWS
  • Tightly integrated in Lambda, S3, EBS, EFS, DynamoDB, SQS and many more
  • Allows to import your own keys
  • Control access to the keys using IAM users and roles
  • Audit using CloudTrail
  • PCI DSS, FIPS 140-2 compliant

CloudHSM

Dedicated hardware device within a VPC that can offload SSL from a web server or act as a CA.

  • Single-tenant
  • Customer managed root of trust
  • Broad support by 3rd party

Classic CloudHSM

  • Based on safeNet Luna SA
  • Requires $5000 upfront cost
  • Single device
  • FIPS 140-2 Level 2

Current CloudHSM

  • Proprietary device
  • No upfront costs
  • HA
  • FIPS 140-2 Level 3

Certificate Manager

A service to provision manage and deploy SSL certificates.

  • Directly integrated in CloudFront, ELB and API Gateway
  • Allow importing 3rd party certificates
  • Supports wildcard domains
  • Manages certificate renewal
  • Also supports Private Certificate Authority for internal apps

Migrations

Migration strategies

Comparison of the strategies:

Name Time and costs Opportunity to optimize Example
Re-Host (Lift and Shift) * * * Move on-premise MySQL database to an EC2 instance
Re-Platform (Lift and Reshape) * * * * * * * Move on-premise MySQL database to RDS MySQL
Re-Purchase (Drop and Shop) * * * * Abandon a legacy system and move to a 3rd party
Rearchitecture * * * * * * * * * * Replace legacy app with a serverless function
Retire Get rid of the old application
Retain * Do nothing

Cloud Adoption Framework

An alternative to The Open Group Architectural Framework (TOGAF) which was developed in 1995

Key aspects:

Business

  • Create a strong business case
  • Measure the benefits (TCO, ROI)

People

  • Reevaluate roles and structures, skills and process that need to fill the gaps
  • Align motivations and career management with evolving roles
  • Training options

Governance

  • Portfolio management should help to simplify migration
  • Align KPI with newly established business capabilities

Platform

  • Standardization
  • Architectural patterns tailor to cloud-native approach
  • Develop new skills to leverage the platform

Security

  • Change in Identity and Access management
  • Logging and audit capabilities evolve
  • Shared-responsibility model removes some facets and adds other facets

Operations

  • Monitoring
  • Measure and adjust performance
  • Disaster recovery takes new methods

Migration Hub

Storage migration

Server Migration Service

Automates migration of on-premise VMware vSphere or Microsoft Hyper-V machines to AWS (Windows and Linux VMs only)

  • Can sync Volumes and take regular AMI snapshots
  • Good for disaster recovery

Database Migration Service (DMS)

Along with Schema Conversion Tool (SCT) helps to migrate DBs to RDS or EC2

  • Informix DB is not supported

DMS

  • Used for smaller, simpler conversion
  • Supports MongoDB and DynamoDB
  • Has replication function for on-premise DB to AWS DB, Snowball or S3

SCT

  • Suits for larger, complex databases like data-warehouses
  • Can convert schemas for migration on the same DB, or a different DB (from Oracle to Aurora)

Application Discovery Service

Gathers information about the on-premise data center to help in migration planning.

  • Can run agent-less in a VMware setup
  • Requires an agent otherwise
  • Collect config, usage and behaviour data

Network migration process

  • Most organisations start with a VPN connection
  • As usage grows - you switch to Direct Connect and keep VPN as a backup
  • Transition to VPN to Direct Connect using BGP weighting and static routes

Deployment and Operations

Deployment strategies

Name Deployment Time Downtime Rollback Example
All at once * Yes Manual Deploy to the same instances
Rolling * * - Manual Deploy to the same instances one by one
Rolling with extra batch * * * - Manual Launch new instances with new version before removing the old one
Immutable * * * * - Kill new nodes Launch a full set of instances with new version
Traffic splitting * * * * - Reroute DNS Percent of traffic routed to new “canary” instance
Blue-Green * * * * - Swap URL DNS entry is changed when a new version is fully up

Blue-green methods

  • Update Route53 record to point to a new ELB or instance
  • Swap autoscaling group behind the ELB
  • Change environment URL in Beanstalk
  • Clone stack in OpsWorks and update DNS

Blue-green contraindications (anti-patters)

  • Data store schema is tightly coupled with the code
  • Upgrade requires special operations during the deployment
  • Third-party products might not be blue-green friendly

Beanstalk

Orchestration service to deploy applications with a single click

  • Supports Docker, PHP, Java, Node.js, etc.
  • Multiple environments within the application (QA, DEV, PROD, etc.)
  • Allows swapping Environment URLs to do blue-green deployment
  • Can be used to create Web-server or Worker environments
  • Not for short tasks

CloudFormation

Infrastructure as Code

Main Components

  • Templates - JSON or YAML file with instructions for building the environment
  • Stacks - entire environment described by the template as a unit
  • Change Set - a summary of the proposed changes to the Stack
  • Stack policies - deletion/update protection for resource

Stack policies cannot be removed and only updated though CLI

AWS Config

Allows to audit, assess and evaluate configurations of AWS recourses

  • Create a baseline configurations and tracks deviations
  • Config Rules check resources for certain conditions, and flags deviations as “non-compliant”

OpsWorks

A manged instance of Chef or Puppet to deploy code, automate tasks, configure instances, perform upgrades, etc.

  • Global service

OpsWorks Stacks

OpsWorks Stacks uses an embedded Chef solo client on EC2 instances to run Chef recipes

  • Stacks are collections of resources needed to support a service application
  • Layers are different components of the application
  • EC2 instances, RDS instances, Load balancers are examples of Layers
  • Stacks are regional

AWS System Manager

Centralized console for system management tasks

  • Designed to manage large fleets of systems
  • SSM agent supports OSs supported by AWS
  • SSM agent is available by default in most AMIs
  • Can be installed on-premise

Services

  • Inventory - Collect OS, application and instance metadata
  • State Manager - Specify groups of machines with the same configuration
  • Logging - Stream logs from instances to CloudWatch
  • Parameter Store - Shared secure storage for sensitive data
  • Resource Groups - Group resources by tagging
  • Maintenance Windows - Define schedules for instances to apply patches and install updates
  • Automation - Run routine maintenance tasks and scripts
  • Run Command - Run a one-off command on any machine
  • Patch Manager - Automates the process of applying patches

System Manager (SSM) Documents are JSON/YAML files that specify the tasks that SSM performs

  • Command document - Holds command to execute
  • Policy document - Defines conditions that put an instance in a given state for State Manager
  • Automation document - A list of tasks for Automation service

Cost optimisation

Capital Expenses (CapEx) - money spent on a long-term assets Operational Expenses (OpEx) - a variable expense that business pays to keep running Total Cost of Ownership (TCO) - a comprehensive look of all related expenses, both hard and soft
Return Of Investment (ROI) - an amount of money we expect to receive back in a given timeframe

Cost Optimisation Strategies

Appropriate provisioning

  • provision resources you need and nothing more
  • Consolidate for possible greater density and lower complexity
  • CloudWatch to watch utilisation

Right sizing

  • Use lowest-cost resources that still meet the requirements
  • Architect for most consistent use of resources to avoid usage spikes
  • Loose coupling helps to scale components independently

Purchase options

  • Use Reserved Instances for permanent applications
  • Spot instances for temporary horizontal scaling
  • EC2 Fleets to define a target mix of RI, On-Demand and Spot instances

Geographical Selection

  • AWS pricing varies from region to region
  • Place some resources in a remote region if local access is not required
  • Route53 and Cloudfront reduce potential latency issues

Managed services

  • Leverage RDS, Redshift, Fargate or EMR to drive TCO down

Optimized data transfer

  • Data going out and cross-region can be a significant cost component
  • DirectConnect is sometimes a cost-effective option depending on the volume and speed

Reserved instances

Agreement to purchase usage of EC2 instances in advance for a discount over On-Demand prices

  • Provides reserved capacity when used with a specific AZ
  • AWS Billing automatically applies the discount rates when you launch an instance that matches RI agreement
  • Can be shared across multiple accounts within consolidated billing
  • You can try to sell standard RI on the RI Marketplace

RI types comparison

Standard Convertible
Terms 1 or 3 years 1 or 3 years
Discount 40% - 60% 31% - 54%
Change AZ Yes Yes
Change Instance Size Yes Yes
Change Network Type Yes Yes
Change instance family No Yes
Change OS No Yes
Change Tenancy No Yes
Change Payment option No Yes
Uses Price Reduction No Yes
Sell on RI Marketplace Yes Soon

Attributes

  • Instance type - CPU, Memory, Network capability
  • Platform - Linux, SUSE, RHEL, Windows, SQL Server
  • Tenancy - Default or dedicated
  • AZ (optional) - If AZ is not specified, there is no reservation created, and the discount applies to any instance in the family in any AZ in the region

Spot instances

Excess EC2 capacity that AWS sells on the market exchange basis

  • Customer defines the highest price willing to pay for an instance
  • If there are not enough instances and others are willing to pay more - your instance is terminated

Spot instance types

  • One time only - ephemeral data on the node will be lost
  • Maintain - configurable to Terminate, Stop or Hibernate until price point met again
  • Duration based

Budgets

  • Allows setting predefined limits and notification for exceeding the budget
  • Based on Cost Usage, Reserved Instance Utilisation or RI Coverage
  • Useful to distribute cost and usage awareness

Consolidated billing

  • Single account with restricted access is a Payer
  • More benefits from the Economies of scale

Trusted Advisor

  • Runs a series of checks on your resources and proposes improvements
  • Can help optimize scaling or reserved capacities
  • Core checks are available for everyone
  • Full list of checks is only for Business and Enterprise support plans