What is Amazon S3?

Amazon S3 (short for Amazon Simple Storage Service) is a cloud storage service that allows for object, (files), storage in buckets, (directories), and is advertised as “infinitely scaling” storage.

Here is an analogy

Think of it as a giant, secure, online hard drive that apps and people can access over the internet.

Note: S3 buckets are defined at a  region level, it is not a global service. However, bucket names must be unique across all regions and AZs.

S3 buckets are used for:

  • Backup & Storage;
  • Disaster recovery;
  • Archive;
  • Hybrid cloud storage;
  • Application hosting;
  • Media hosting;
  • Data lakes & Big data analytics;
  • Software delivery;
  • Static websites.

What is inside an S3 Bucket?

All S3 Buckets are composed by S3 objects. Officially, there is no concept of a directory in an S3 Bucket, there are only, keys and objects.

An object is a file, that has a key which represents it’s path in the bucket.

This key is composed of a prefix + the object name:

              # Key                    # Object
s3://my-bucket/my_folder/another_folder/my_file.txt

S3 Object Versioning

S3 buckets are able to version objects if the versioning is toggled on. This will create a new version of the object each time it is updated.

S3 Bucket Replication

An S3 bucket can be replicated to another S3 bucket.

For this to happen, versioning must be enabled in both the source and destination S3 Buckets.

The replication happens asynchronously and can be performed in the same or another region.

Note: This can be done between different AWS accounts.

S3 Bucket Security

There are a few types of security enforcements that can be applied to S3 buckets:

User Based

IAM policies - Which API calls should be allowed for a specific IAM user.

Resource Based

  • Bucket Policies - Bucket wide rules from the S3 console. Allows cross account;
  • Object Access Control List (ACL) - Finer grain (can be disabled);
  • Bucket Access Control List (ACL) - Less common (can be disabled).

Encryption

Encryption on S3 can be achieved using one of four methods:

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)

This method is enabled by default and is used when encrypting objects on the server-side with the AES-256 type, using keys managed and owned by AWS.

Note: The customer never has access to these keys.

Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)

This method is used when encrypting objects on the server-side, using keys managed by AWS KMS.

When using KMS, the customer can create his own keys on AWS and audit their usage using CloudTrail.

Server-Side Encryption with Customer-Provided Keys (SSE-C)

This method is used when encrypting objects on the server-side using customer managed keys, outside of AWS.

When using this method, the customer has to create his own key and send it on the request header to AWS in order to encrypt the files.

Note: Using SSE-C, AWS does not store the provided encryption key.

Client-Side Encryption

This method is used when encrypting all files before uploading them to S3. (Client-side).

S3 Storage Classes

Amazon S3 can have one of the following storage classes:

StandardIntelligent TieringStandard IAOne zone IAGlacier Instant RetrievalGlacier Flexible RetrievalGlacier Deep Archive
Durability99.999999999%99.999999999%99.999999999%99.999999999%99.999999999%99.999999999%99.999999999%
Availability99.9%99.9%99.9%99.5%99.9%99.99%99.99%
Availability SLA99.9%99%99%99%99%99.9%99.9%
Availability Zones>= 3>= 3>= 31>= 3>= 3>= 3
Min. Storage Duration ChargeNoneNone30 Days30 Days90 Days90 Days180 Days
Min. Billable Object SizeNoneNone128 KB128 KB128 KB40 KB40 KB
Retrieval FeeNoneNonePer GB retrievedPer GB retrievedPer GB retrievedPer GB retrievedPer GB retrieved

S3 Standard - General Purpose

  • 99.9% availability;
  • Used for frequently accessed data;
  • Low latency & throughput;
  • Can sustain 2 concurrent facility failures;
  • Use cases: Big Data analytics, mobile & gaming applications, content distribution, etc.

S3 Infrequent Access (IA)

  • Used for data that is less frequently accessed but required rapid access when needed;
  • Lower cost that the S3 Standard class.

Standard IA

  • 99.9% availability;
  • Use cases: Disaster recovery, backups.

One Zone IA

  • 99.5% availability;
  • Data is lost if AZ is destroyed;
  • Use cases: Storing secondary backup copies of on-prem data, or data that can be re-created.

S3 Glacier

  • Low cost object storage meant for archiving/backup;
  • Pricing: Price for storage + object retrieval cost.

Instant Retrieval

  • Millisecond retrieval, great for data accessed once a quarter;
  • Minimum storage duration is 90 days.

Flexible Retrieval (Former S3 Glacier)

  • Multiple free retrieval options:
    • Expedited (1-5 minutes);
    • Standard (3-5 hours);
    • Bulk (5-12 hours).
  • Minimum storage duration is 90 days.

Deep Archive

  • Multiple free retrieval options:
    • Standard (12 hours);
    • Bulk (48 hours).
  • Minimum storage duration is 180 days.

S3 Intelligent-Tiering

  • No retrieval charges;
  • Small monthly monitoring and auto-tiering fee;
  • Moves objects automatically between the following access tiers based on usage:
    • Frequent Access Tier: Default tier;
    • Infrequent Access Tier: Objects not accessed for 90 days;
    • Archive Instant Access Tier: Objects not accessed for 90 days;
    • Archive Access Tier (optional): Configurable from 90 days to 700+ days.
    • Deep Archive Access Tier (optional): Configurable from 180 days to 700+ days.

S3 Performance

When it comes to performance optimization, there are two features to consider:

S3 Multi-part Upload

S3 Multi-part Upload is an S3 feature that lets us upload a single large object as a set of smaller, independent parts.

S3 Multi-part Upload

It is designed to improve reliability, performance, and flexibility when handling large files, especially those above 100 MB, and it is required for objects larger than 5 GB.

S3 Transfer Acceleration

It uses the Amazon CloudFront global network of edge locations as a proxy.

S3 Transfer Acceleration

Amazon Athena (Analytics)

Amazon Athena is a serverless query service to perform analytics against S3 objects.

Is uses standard SQL language to query the files and supports CSV, JSON, ORC, Avro and Parquet.

Athena