S3

go-to storage service for AWS
global namespace (buckets have unique names) but region resilient (buckets are per-region)
when not used, data will only be stored within region
designed for multi-user access
also suitable for large data sets
Buckets (i.e. disks) contain objects (i.e. files) identified by their keys (i.e. file paths)
object: a file/piece of data
- possible size of object range from 0 bytes to 5TB
- version ID
- metadata
- access control
- subresources
bucket: container for objects
- bucket data doesn’t leave region unless specifically configured
- bucket name needs to be globally unique (all regions, all AWS accounts)
- inherently flat structure, though keys with slashes emulate folders, which is why folders are referred to as prefixes
exam tips
- bucket name
  - globally unique (can’t have the same bucket name as someone else’s in another region)
  - 3-63 chars, all lowercase, no underscore
  - start with lowercase or number
  - IP address as not allowed
- accounts can have 100 buckets (soft limit), max 1000 hard limit
- unlimited number of objects
- key = name, value = data
S3 purpose
- object storage, not file (e.g. file share) or block (e.g. mountable disks) stage
- great for offloading data sets
- I/O for AWS products

Security

S3 buckets are private by default
bucket policy ~= AWS resource policy (allow anon access)
- can control specific object access
ACLs: legacy (not recommended)
- control access to object and bucket
- inflexible and simple (e.g. one ACL can’t limit access to a group of objects)
Block Public Access: options that override bucket policy for public access
when to use what
- IAM: if customizing different resources/users within an account
- Bucket Policy: S3-specific settings, anonymous or cross-account settings
- ACL: never
to-do S3 lifecycle policy for versioning

Static Website Hosting

normally, S3 access is through AWS API, but SWH allows (public) access of objects through HTTP
requires setting S3 keys for the Index page and Error page
Website Endpoint is randomly created (not customizable)
Important: bucket name must match domain name used; need to create a Route 53 A record “alias to S3 endpoint” for custom domain
useful for
- offloading media from compute instances
- out-of-band pages: e.g. temporary site maintenance page
need to turn off Block Public Access and add bucket policy to allow s3:GetObject for all keys (/*) for all principals (anonymous included)

Object Versioning

default disabled, can be enabled; once enabled, cannot be disabled, but can be suspended
- DAG: disabled → enabled ←> suspended
modification produces new object version
by default S3 returns latest version
deletion just adds a delete marker, but the file is still there; to undo delete, just remove delete marker
- to really delete an object, delete using specific object version IDs
MFA Delete: when enabled, changing object versioning and deleting versions require MFA serial number plus MFA passcode

Uploading

limit of single stream upload (single PUT request)
- can only upload 5GB of data
- cannot recover from failure, must reupload entire file
- often cannot saturate either end’s network capacity
multi-part upload
- need at least 100MB in total to use multi-part
- parts can be restarted
- limits
  - 10000 parts maximum
  - part size ranges from 5MB to 5GB
S3 Accelerated Transfer
- scenario: upload traffic is often not routed the best way
- S3 transfer acceleration: upload to AWS nearest edge locations (via CloudFront) instead of S3 bucket directly
- requirement: bucket name must not contain period and must be DNS friendly
- benefits: lower latency, higher upload speed (can be 1-2 times faster for distant regions)

Encryption

Only objects are encrypted. Encryption is not done at bucket level, though default encryption key can be configured.
Server-Side Encryption (SSE): encrypt/decrypt objects within the server; SSE is mandatory
- SSE-C: with customer-provided keys
- SSE-S3: with S3-managed keys (default)
  - bucket has a AWS-managed key that envelope-encrypts/decrypts the per-object key
  - not suitable for heavily regulated industries ( )
  - AES-256
- SSE-KMS: with KMS keys
  - same idea as SSE-S3, except we get to manage the per-bucket key, and the per-object key is just a DEK
  - more logging & auditing capabilities
  - role separation (allow S3 admin to configure S3 but not access data, allow service desk to access data but not configure S3/change permissions)
  - allow manual control of bucket key rotation
  - SSE-KMS without S3 Bucket Key
    - Each object needs their own DEK created from KMS key
    - API to create DEK keys have rate-limit
  - SSE-KMS with S3 Bucket Key
    - KMS is used to create a short-lived bucket key, then this bucket key is used to encrypt/decrypt per-object key
    - The temporary Bucket Key is different for each requester so that CloudTrail log is more specific.
    - Much more scalable and cost-efficient
    - CloudTrail KMS events are now per-bucket, instead of per-object.
    - replication-compatible
Note that each object can have a different SSE encryption type

Storage Class

Storage class is per-object. Same bucket can contain objects in different storage classes.

S3 Standard: go-to storage class; useful for frequently accessed important data
- object is replicated across at least 3 AZs
- durability: 99.999999999%, one object loss per 10,000 years
- availability: 99.99%
- CRC used to check for corruption
- HTTP 200 signifies successful store
- GB/mo fee for storage, GB/mo transfer OUT, and fee per 1000 requests
  - transfer IN is free
- millisecond first-byte latency
S3 Standard Infrequent Access (S3 Standard-IA): long-lived important data, infrequently used, regular sized files
- mostly the same as Standard
- availability: 99.9%
- ~50% cheaper for storage compared to S3 Standard
- new fee categories:
  - data retrieval fee per GB
  - minimum duration charge: data is billed for 30-days minimum
  - minimum storage charge: data is billed for 128KB minimum
S3 One Zone Infrequent Access (S3 One Zone-IA): useful for infrequently accessed data that can be easily replaced
- mostly the same as One Zone-IA
- availability: 99.5%
- cheaper than Standard-IA
- data is only available in one AZ
  - durability is still 99.999999999%, until the AZ fails
S3 Glacier Instant Retrieval: infrequent (e.g. one per quarter or year) but instant access
- cheaper storage
- 90-day minimum duration charge
- same as Standard-IA with longer minimum storage duration charge
S3 Glacier Flexible Retrieval
- cheaper storage (1/6 cost of S3 Standard)
- data is chilled
- first-byte latency: minutes or hours
- retrieval is paid, temporary stored in Standard-IA, then removed
- retrieval charges
  - expedited: 1-5 minutes
  - standard: 3-5 hours
  - bulk: 5-12
- bucket can’t be public
S3 Glacier Deep Archive
- even cheaper
- data is frozen
- 180-day minimum duration charge
- first-byte latency: hours/days
S3 Intelligent Tiering: good for data whose access patterns change
- have 5 different tiers
  - frequent access: S3 Standard
  - infrequent access: S3 Standard-IA
  - archive instant access: S3 Glacier Instant Retrieval
  - archive access: S3 Glacier Flexible Retrieval
  - deep archive: S3 Deep Archive
- objects are automatically moved to appropriate tiers
- object usage monitoring charge per 1000 objects apply for size >128kB

Lifecycle Configuration

Lifecycle configuration: Automatically transition or expiration of objects to save costs, etc

basically a set of bucket rules (if … then …)
rules can be bucket-wide or object-group-wise
rules are based on time and not based on access, unlike Intelligent Tiering
action types
- transition: S3 Standard to S3 Standard-IA after 30 days of not accessing
- expiration: setting objects to auto-delete after a set duration
Waterfall transition model: Storage class with more frequent access pattern (e.g. Standard) can transition into any storage class with less frequent access pattern (e.g. S3 Glacier Deep Archive).
- Note: S3 One Zone-IA cannot transition to S3 Glacier Instant Retrieval
- Transition can’t happen from less access to more access: Objects cannot be moved back to S3 Standard after transitioning.
- Objects already in Standard have to wait 30 days before lifecycle transition. Direct upload to the other tier are OK.
- After a transition, a rule need to wait 30 days before transition to Glacier tiers.

Replication

Cross-Region Replication (CRR): replicate across multiple regions
Same-Region Replication (SRR): replicate within the same region
Replication configuration: attached to the source bucket
- destination bucket used
- IAM role to use for replication
  - allow S3 to assume (trust policy)
  - read source bucket, write destination bucket (permission policy)
  - for replication across different AWS accounts
    - destination bucket policy needs to allow source account role to write objects
Replication options
- Can replicate all objects or use filter
- Use the same or change storage class in destination
- Ownership: default is source account, but may need to change for bucket in another account (the destination account won’t be able to read objects)
- Replication Time Control (RTC): once configured, guarantees 15-minute replication SLA for 99.99% of objects, used only for strict business requirements
Considerations
- by default replication is not retroactive, only new objects are replicated
  - one-time Batch Operations can be used to sync old objects when replication is turned on
- for both source and destination buckets, versioning needs to be on
- by default one-way replication, or enable bi-directional replication
- can handle unencrypted, SSE-S3, SSE-KMS (extra configuration), SSE-C
- cannot replicate system events (e.g. won’t replicate lifecycle transition in source bucket), Glacier, or Glacier Deep Archive
- deletes are not replicated by default, but can be enabled with DeleteMarkerReplication
Replication usage scenarios
- SRR - log aggregation into single S3 bucket
- SRR - production and test sync
- SRR - achieve data resilience while maintaining sovereignty requirements (e.g. requirement that data must be in same region)
- CRR - reduce latency for customer in different region

S3 Presigned URLs

Give another client access to private bucket/object without having to authenticate
time limited
can be used to GET or PUT object(s)
Note: pre-signed URL can technically be generated for objects that the signer has no access to, but URL only has as much permission to the bucket as the signer at the moment of use (not moment of generation!) — access denial could be that the signer never had the right or doesn’t have the right as of now
Presigned URL can expire before set expiration date if the access key of the role that generated it expired, so use a persistent IAM user instead of role to sign the URL.

S3 Select + S3 Glacier Select

allow getting part of large object instead of whole object
can use SQL-like statements to prefilter object parts
allowed formats: CSV, JSON, Parquet, bzip2 (only when used with CSV or JSON)

S3 Events

use events in S3 to trigger SNS, SQS, or Lambda
event types
- object creation
- object deletion
- object restoration
- object replication
requires resource policy on SNS, SQS, or Lambda to allow S3 to access these resources
alternative: EventBridge
- supports more events and more services

S3 Access Log

log bucket or object access
logging is managed by S3 Log Delivery Group set on the source bucket (logging is done on best-effort basis, can take a few hours)
needs ACL on target bucket (log storage) for S3 Log Delivery Group
log records are newline-delimited, while attributes are space-delimited

S3 Object Lock

WORM: Write-Once-Read-Many
Object lock requires versioning
Once object lock, object versions are locked – cannot be deleted
Two way of retention (both, one, the other, or none of them)
- retention period
- legal hold
a default Object Lock feature may be enabled
Object Lock modes for retention period
- In compliance mode, no changes are allowed for both the object and the retention period (not even root user) until retention period expires
- In governance mode, special permission can be granted to adjust retention settings
  - s3:BypassGovernanceRetention + special header to remove lock
  - useful for preventing accidental deletion or for testing before enabling compliance mode
- can remove lock in console
Legal Hold
- No deletion or changes to object version until legal hold is removed, with no expiry
- s3:PutObjectLegalHold is required to add or remove lock
- useful for preventing deletion of critical objects
- can remove lock in console

S3 Access Points

docs
create one access point for each use case of a S3 bucket
benefits
- simplify S3 bucket access for shared bucket
- easier management of prefixes (directories) and split up resource policies
an access point can have its own policies and NACL
access points have their own domain name
Multi-Region Access Point (MRAP): allows using one access point for multiple buckets across regions
- replication and failover supported
- note that there is considerable lag between bucket

Security Memo

Recent Notes

SMART

Bossa Nova

ZFS

post-rock

2024-09-27

S3

Security

Static Website Hosting

Object Versioning

Uploading

Encryption

Storage Class

Lifecycle Configuration

Replication

S3 Presigned URLs

S3 Select + S3 Glacier Select

S3 Events

S3 Access Log

S3 Object Lock

S3 Access Points

Graph View

Table of Contents

Backlinks