What is Amazon S3?

Amazon S3 (short for Amazon Simple Storage Service) is a cloud storage service that allows for object, (files), storage in buckets, (directories), and is advertised as “infinitely scaling” storage.

Here is an analogy

Think of it as a giant, secure, online hard drive that apps and people can access over the internet.

Note: S3 buckets are defined at a region level, it is not a global service. However, bucket names must be unique across all regions and AZs.

S3 buckets are used for:

Backup & Storage;
Disaster recovery;
Archive;
Hybrid cloud storage;
Application hosting;
Media hosting;
Data lakes & Big data analytics;
Software delivery;
Static websites.

What is inside an S3 Bucket?

All S3 Buckets are composed by S3 objects. Officially, there is no concept of a directory in an S3 Bucket, there are only, keys and objects.

An object is a file, that has a key which represents it’s path in the bucket.

This key is composed of a prefix + the object name:

              # Key                    # Object
s3://my-bucket/my_folder/another_folder/my_file.txt

S3 Object Versioning

S3 buckets are able to version objects if the versioning is toggled on. This will create a new version of the object each time it is updated.

S3 Bucket Replication

An S3 bucket can be replicated to another S3 bucket.

For this to happen, versioning must be enabled in both the source and destination S3 Buckets.

The replication happens asynchronously and can be performed in the same or another region.

Note: This can be done between different AWS accounts.

S3 Bucket Security

There are a few types of security enforcements that can be applied to S3 buckets:

User Based

IAM policies - Which API calls should be allowed for a specific IAM user.

Resource Based

Bucket Policies - Bucket wide rules from the S3 console. Allows cross account;
Object Access Control List (ACL) - Finer grain (can be disabled);
Bucket Access Control List (ACL) - Less common (can be disabled).

Encryption

Encryption on S3 can be achieved using one of four methods:

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
Server-Side Encryption with Customer-Provided Keys (SSE-C)
Client-Side Encryption

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)

This method is enabled by default and is used when encrypting objects on the server-side with the AES-256 type, using keys managed and owned by AWS.

Note: The customer never has access to these keys.

Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)

This method is used when encrypting objects on the server-side, using keys managed by AWS KMS.

When using KMS, the customer can create his own keys on AWS and audit their usage using CloudTrail.

Server-Side Encryption with Customer-Provided Keys (SSE-C)

This method is used when encrypting objects on the server-side using customer managed keys, outside of AWS.

When using this method, the customer has to create his own key and send it on the request header to AWS in order to encrypt the files.

Note: Using SSE-C, AWS does not store the provided encryption key.

Client-Side Encryption

This method is used when encrypting all files before uploading them to S3. (Client-side).

S3 Storage Classes

Amazon S3 can have one of the following storage classes:

	Standard	Intelligent Tiering	Standard IA	One zone IA	Glacier Instant Retrieval	Glacier Flexible Retrieval	Glacier Deep Archive
Durability	99.999999999%	99.999999999%	99.999999999%	99.999999999%	99.999999999%	99.999999999%	99.999999999%
Availability	99.9%	99.9%	99.9%	99.5%	99.9%	99.99%	99.99%
Availability SLA	99.9%	99%	99%	99%	99%	99.9%	99.9%
Availability Zones	>= 3	>= 3	>= 3	1	>= 3	>= 3	>= 3
Min. Storage Duration Charge	None	None	30 Days	30 Days	90 Days	90 Days	180 Days
Min. Billable Object Size	None	None	128 KB	128 KB	128 KB	40 KB	40 KB
Retrieval Fee	None	None	Per GB retrieved	Per GB retrieved	Per GB retrieved	Per GB retrieved	Per GB retrieved

S3 Standard - General Purpose

99.9% availability;
Used for frequently accessed data;
Low latency & throughput;
Can sustain 2 concurrent facility failures;
Use cases: Big Data analytics, mobile & gaming applications, content distribution, etc.

S3 Infrequent Access (IA)

Used for data that is less frequently accessed but required rapid access when needed;
Lower cost that the S3 Standard class.

Standard IA

99.9% availability;
Use cases: Disaster recovery, backups.

One Zone IA

99.5% availability;
Data is lost if AZ is destroyed;
Use cases: Storing secondary backup copies of on-prem data, or data that can be re-created.

S3 Glacier

Low cost object storage meant for archiving/backup;
Pricing: Price for storage + object retrieval cost.

Instant Retrieval

Millisecond retrieval, great for data accessed once a quarter;
Minimum storage duration is 90 days.

Flexible Retrieval (Former S3 Glacier)

Multiple free retrieval options:
- Expedited (1-5 minutes);
- Standard (3-5 hours);
- Bulk (5-12 hours).
Minimum storage duration is 90 days.

Deep Archive

Multiple free retrieval options:
- Standard (12 hours);
- Bulk (48 hours).
Minimum storage duration is 180 days.

S3 Intelligent-Tiering

No retrieval charges;
Small monthly monitoring and auto-tiering fee;
Moves objects automatically between the following access tiers based on usage:
- Frequent Access Tier: Default tier;
- Infrequent Access Tier: Objects not accessed for 90 days;
- Archive Instant Access Tier: Objects not accessed for 90 days;
- Archive Access Tier (optional): Configurable from 90 days to 700+ days.
- Deep Archive Access Tier (optional): Configurable from 180 days to 700+ days.

S3 Performance

When it comes to performance optimization, there are two features to consider:

S3 Multi-part Upload
S3 Transfer Acceleration

S3 Multi-part Upload

S3 Multi-part Upload is an S3 feature that lets us upload a single large object as a set of smaller, independent parts.

It is designed to improve reliability, performance, and flexibility when handling large files, especially those above 100 MB, and it is required for objects larger than 5 GB.

S3 Transfer Acceleration

It uses the Amazon CloudFront global network of edge locations as a proxy.

Amazon Athena (Analytics)

Amazon Athena is a serverless query service to perform analytics against S3 objects.

Is uses standard SQL language to query the files and supports CSV, JSON, ORC, Avro and Parquet.

Patrício's DevOps Vault

Topics

08. S3

What is Amazon S3?

What is inside an S3 Bucket?

S3 Object Versioning

S3 Bucket Replication

S3 Bucket Security

User Based

Resource Based

Encryption

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)

Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)

Server-Side Encryption with Customer-Provided Keys (SSE-C)

Client-Side Encryption

S3 Storage Classes

S3 Standard - General Purpose

S3 Infrequent Access (IA)

Standard IA

One Zone IA

S3 Glacier

Instant Retrieval

Flexible Retrieval (Former S3 Glacier)

Deep Archive

S3 Intelligent-Tiering

S3 Performance

S3 Multi-part Upload

S3 Transfer Acceleration

Amazon Athena (Analytics)

Graph View

Table of Contents

Backlinks