What is the Well-Architectured Framework?

The AWS Well-Architected Framework is a comprehensive guide that provides a consistent approach for evaluating and improving cloud architectures, helping organizations design and operate reliable, secure, efficient, and cost-effective systems on Amazon Web Services (AWS).

This framework is made up of 6 pillars:

Pillar 1: Operational Excellence

Operational Excellence includes the the ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

Design Principles:

  • Perform operations as code - Infrastructure as code. (Terraform, CloudFormation. etc.);
  • Make frequent, small, reversible changes - So that in case of any failure you can reverse it;
  • Refine operations procedures frequently - And ensure that team members are familiar with it;
  • Anticipate failure - In case of failure, learn from it;
  • Use managed services - To reduce operational burden;
  • Implement observability for actionable insights - Performance, reliability, cost, etc.

Pillar 2: Security

Security includes the ability to protect information, systems and assets while delivering business value through risk assessments and mitigation strategies.

Design Principles:

  • Implement a strong identity foundation - Centralize privilege management and reduce (or even eliminate) reliance on long-term credentials. (Principle of least privilege).
  • Enable traceability - Integrate logs and metrics with systems to automatically respond and take action;
  • Apply security at all layers - Like edge network, VPC, VPC subnets, load balancer, every instance, operating system and application;
  • Protect data in transit and at rest - Encryption, tokenization and access control;
  • Keep people away from data - Reduce or eliminate the need for direct access or manual processing of data;
  • Prepare for security events - Run incident response simulations and use tools with automation to increase your speed for detection, investigation and recovery.

Pillar 3: Reliability

Reliability includes the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand and mitigate disruptions such as misconfigurations or transient network issues.

Design Principles:

  • Test recovery procedures - Use automation to simulate different failures or to recreate scenarios that led to failures before;
  • Automatically recover from failure - Anticipate and remediate failures before they occur;
  • Scale horizontally to increase aggregate system availability - Distribute requests across multiple, smaller resources to ensure that they don’t share a common point of failure;
  • Stop guessing capacity - Maintain the optimal level to satisfy demand without over or under provisioning. (Use auto scaling).
  • Manage change in automation - Use automation to make changes to infrastructure.

Pillar 4: Performance Efficiency

Performance includes the ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand changes and technologies evolve.

Design Principles:

  • Democratize advanced technologies - Advance technologies become services and hence you can focus more on product development;
  • Go global in minutes - East deployment in multiple regions;
  • Use serverless architectures - Avoid burden of managing servers;
  • Experiment more often - Easy to carry out comparative testing;
  • Mechanical sympathy - Be aware of all AWS services.

Pillar 5: Cost Optimization

Cost Optimization includes the ability to run systems to deliver business value at the lowest price point.

Design Principles:

  • Adopt a consumption mode - Pay only for what you use;
  • Measure overall efficiency - Use CloudWatch;
  • Stop spending money on data center operations - AWS does the infrastructure part and enables customers to focus on organization projects;
  • Analyze and attribute expenditure - Accurate identification of system usage and costs helps measure return on investment (ROI) - Make sure to use tags;
  • Use managed and application level services to reduce cost of ownership - As managed services operate at cloud scale, they can offer a lower cost per transaction or service.

Pillar 6: Sustainability

Sustainability includes the ability of minimizing the environmental impact of running cloud workloads.

Design Principles:

  • Understand your impact - Establish performance indicators and evaluate improvements;
  • Establish sustainability goals - Set long-term goals for each workload, model return on investment (ROI);
  • Maximize utilization - Right size each workload to maximize the energy efficiency of the underlying hardware and minimize idle resources;
  • Anticipate and adopt new, more efficient hardware and software offerings - Design for flexibility to adopt new technologies over time;
  • Use managed services - Shared services reduce the amount of infrastructure. Managed services help automate sustainability best practices as moving infrequent accessed data to cold storage and adjusting compute capacity.
  • Reduce the downstream impact of your cloud workloads - Reduce the ammount of energy or resources required to use your services and reduce the need for your customers to upgrade their devices.