- go-to storage service for AWS
- global namespace (buckets have unique names) but region resilient (buckets are per-region)
- when not used, data will only be stored within region
- designed for multi-user access
- also suitable for large data sets
- Buckets (i.e. disks) contain objects (i.e. files) identified by their keys (i.e. file paths)
- object: a file/piece of data
- possible size of object range from 0 bytes to 5TB
- version ID
- metadata
- access control
- subresources
- bucket: container for objects
- bucket data doesn’t leave region unless specifically configured
- bucket name needs to be globally unique (all regions, all AWS accounts)
- inherently flat structure, though keys with slashes emulate folders, which is why folders are referred to as prefixes
- exam tips
- bucket name
- globally unique (can’t have the same bucket name as someone else’s in another region)
- 3-63 chars, all lowercase, no underscore
- start with lowercase or number
- IP address as not allowed
- accounts can have 100 buckets (soft limit), max 1000 hard limit
- unlimited number of objects
- key = name, value = data
- bucket name
- S3 purpose
- object storage, not file (e.g. file share) or block (e.g. mountable disks) stage
- great for offloading data sets
- I/O for AWS products
Security
- S3 buckets are private by default
- bucket policy ~= AWS resource policy (allow anon access)
- can control specific object access
- ACLs: legacy (not recommended)
- control access to object and bucket
- inflexible and simple (e.g. one ACL can’t limit access to a group of objects)
- Block Public Access: options that override bucket policy for public access
- when to use what
- IAM: if customizing different resources/users within an account
- Bucket Policy: S3-specific settings, anonymous or cross-account settings
- ACL: never
- to-do S3 lifecycle policy for versioning
Static Website Hosting
- normally, S3 access is through AWS API, but SWH allows (public) access of objects through HTTP
- requires setting S3 keys for the Index page and Error page
- Website Endpoint is randomly created (not customizable)
- Important: bucket name must match domain name used; need to create a Route 53 A record “alias to S3 endpoint” for custom domain
- useful for
- offloading media from compute instances
- out-of-band pages: e.g. temporary site maintenance page
- need to turn off Block Public Access and add bucket policy to allow s3:GetObject for all keys (
/*
) for all principals (anonymous included)
Object Versioning
- default disabled, can be enabled; once enabled, cannot be disabled, but can be suspended
- DAG: disabled → enabled ←> suspended
- modification produces new object version
- by default S3 returns latest version
- deletion just adds a delete marker, but the file is still there; to undo delete, just remove delete marker
- to really delete an object, delete using specific object version IDs
- MFA Delete: when enabled, changing object versioning and deleting versions require MFA serial number plus MFA passcode
Uploading
- limit of single stream upload (single PUT request)
- can only upload 5GB of data
- cannot recover from failure, must reupload entire file
- often cannot saturate either end’s network capacity
- multi-part upload
- need at least 100MB in total to use multi-part
- parts can be restarted
- limits
- 10000 parts maximum
- part size ranges from 5MB to 5GB
- S3 Accelerated Transfer
- scenario: upload traffic is often not routed the best way
- S3 transfer acceleration: upload to AWS nearest edge locations (via CloudFront) instead of S3 bucket directly
- requirement: bucket name must not contain period and must be DNS friendly
- benefits: lower latency, higher upload speed (can be 1-2 times faster for distant regions)
Encryption
- Only objects are encrypted. Encryption is not done at bucket level, though default encryption key can be configured.
- Server-Side Encryption (SSE): encrypt/decrypt objects within the server; SSE is mandatory
- SSE-C: with customer-provided keys
- SSE-S3: with S3-managed keys (default)
- bucket has a AWS-managed key that envelope-encrypts/decrypts the per-object key
- not suitable for heavily regulated industries ( )
- AES-256
- SSE-KMS: with KMS keys
- same idea as SSE-S3, except we get to manage the per-bucket key, and the per-object key is just a DEK
- more logging & auditing capabilities
- role separation (allow S3 admin to configure S3 but not access data, allow service desk to access data but not configure S3/change permissions)
- allow manual control of bucket key rotation
- SSE-KMS without S3 Bucket Key
- Each object needs their own DEK created from KMS key
- API to create DEK keys have rate-limit
- SSE-KMS with S3 Bucket Key
- KMS is used to create a short-lived bucket key, then this bucket key is used to encrypt/decrypt per-object key
- The temporary Bucket Key is different for each requester so that CloudTrail log is more specific.
- Much more scalable and cost-efficient
- CloudTrail KMS events are now per-bucket, instead of per-object.
- replication-compatible
- Note that each object can have a different SSE encryption type
Storage Class
Storage class is per-object. Same bucket can contain objects in different storage classes.
- S3 Standard: go-to storage class; useful for frequently accessed important data
- object is replicated across at least 3 AZs
- durability: 99.999999999%, one object loss per 10,000 years
- availability: 99.99%
- CRC used to check for corruption
- HTTP 200 signifies successful store
- GB/mo fee for storage, GB/mo transfer OUT, and fee per 1000 requests
- transfer IN is free
- millisecond first-byte latency
- S3 Standard Infrequent Access (S3 Standard-IA): long-lived important data, infrequently used, regular sized files
- mostly the same as Standard
- availability: 99.9%
- ~50% cheaper for storage compared to S3 Standard
- new fee categories:
- data retrieval fee per GB
- minimum duration charge: data is billed for 30-days minimum
- minimum storage charge: data is billed for 128KB minimum
- S3 One Zone Infrequent Access (S3 One Zone-IA): useful for infrequently accessed data that can be easily replaced
- mostly the same as One Zone-IA
- availability: 99.5%
- cheaper than Standard-IA
- data is only available in one AZ
- durability is still 99.999999999%, until the AZ fails
- S3 Glacier Instant Retrieval: infrequent (e.g. one per quarter or year) but instant access
- cheaper storage
- 90-day minimum duration charge
- same as Standard-IA with longer minimum storage duration charge
- S3 Glacier Flexible Retrieval
- cheaper storage (1/6 cost of S3 Standard)
- data is chilled
- first-byte latency: minutes or hours
- retrieval is paid, temporary stored in Standard-IA, then removed
- retrieval charges
- expedited: 1-5 minutes
- standard: 3-5 hours
- bulk: 5-12
- bucket can’t be public
- S3 Glacier Deep Archive
- even cheaper
- data is frozen
- 180-day minimum duration charge
- first-byte latency: hours/days
- S3 Intelligent Tiering: good for data whose access patterns change
- have 5 different tiers
- frequent access: S3 Standard
- infrequent access: S3 Standard-IA
- archive instant access: S3 Glacier Instant Retrieval
- archive access: S3 Glacier Flexible Retrieval
- deep archive: S3 Deep Archive
- objects are automatically moved to appropriate tiers
- object usage monitoring charge per 1000 objects apply for size >128kB
- have 5 different tiers
Lifecycle Configuration
Lifecycle configuration: Automatically transition or expiration of objects to save costs, etc
- basically a set of bucket rules (if … then …)
- rules can be bucket-wide or object-group-wise
- rules are based on time and not based on access, unlike Intelligent Tiering
- action types
- transition: S3 Standard to S3 Standard-IA after 30 days of not accessing
- expiration: setting objects to auto-delete after a set duration
- Waterfall transition model: Storage class with more frequent access pattern (e.g. Standard) can transition into any storage class with less frequent access pattern (e.g. S3 Glacier Deep Archive).
- Note: S3 One Zone-IA cannot transition to S3 Glacier Instant Retrieval
- Transition can’t happen from less access to more access: Objects cannot be moved back to S3 Standard after transitioning.
- Objects already in Standard have to wait 30 days before lifecycle transition. Direct upload to the other tier are OK.
- After a transition, a rule need to wait 30 days before transition to Glacier tiers.
Replication
- Cross-Region Replication (CRR): replicate across multiple regions
- Same-Region Replication (SRR): replicate within the same region
- Replication configuration: attached to the source bucket
- destination bucket used
- IAM role to use for replication
- allow S3 to assume (trust policy)
- read source bucket, write destination bucket (permission policy)
- for replication across different AWS accounts
- destination bucket policy needs to allow source account role to write objects
- Replication options
- Can replicate all objects or use filter
- Use the same or change storage class in destination
- Ownership: default is source account, but may need to change for bucket in another account (the destination account won’t be able to read objects)
- Replication Time Control (RTC): once configured, guarantees 15-minute replication SLA for 99.99% of objects, used only for strict business requirements
- Considerations
- by default replication is not retroactive, only new objects are replicated
- one-time Batch Operations can be used to sync old objects when replication is turned on
- for both source and destination buckets, versioning needs to be on
- by default one-way replication, or enable bi-directional replication
- can handle unencrypted, SSE-S3, SSE-KMS (extra configuration), SSE-C
- cannot replicate system events (e.g. won’t replicate lifecycle transition in source bucket), Glacier, or Glacier Deep Archive
- deletes are not replicated by default, but can be enabled with DeleteMarkerReplication
- by default replication is not retroactive, only new objects are replicated
- Replication usage scenarios
- SRR - log aggregation into single S3 bucket
- SRR - production and test sync
- SRR - achieve data resilience while maintaining sovereignty requirements (e.g. requirement that data must be in same region)
- CRR - reduce latency for customer in different region
S3 Presigned URLs
- Give another client access to private bucket/object without having to authenticate
- time limited
- can be used to GET or PUT object(s)
- Note: pre-signed URL can technically be generated for objects that the signer has no access to, but URL only has as much permission to the bucket as the signer at the moment of use (not moment of generation!) — access denial could be that the signer never had the right or doesn’t have the right as of now
- Presigned URL can expire before set expiration date if the access key of the role that generated it expired, so use a persistent IAM user instead of role to sign the URL.
S3 Select + S3 Glacier Select
- allow getting part of large object instead of whole object
- can use SQL-like statements to prefilter object parts
- allowed formats: CSV, JSON, Parquet, bzip2 (only when used with CSV or JSON)
S3 Events
- use events in S3 to trigger SNS, SQS, or Lambda
- event types
- object creation
- object deletion
- object restoration
- object replication
- requires resource policy on SNS, SQS, or Lambda to allow S3 to access these resources
- alternative: EventBridge
- supports more events and more services
S3 Access Log
- log bucket or object access
- logging is managed by S3 Log Delivery Group set on the source bucket (logging is done on best-effort basis, can take a few hours)
- needs ACL on target bucket (log storage) for S3 Log Delivery Group
- log records are newline-delimited, while attributes are space-delimited
S3 Object Lock
- WORM: Write-Once-Read-Many
- Object lock requires versioning
- Once object lock, object versions are locked – cannot be deleted
- Two way of retention (both, one, the other, or none of them)
- retention period
- legal hold
- a default Object Lock feature may be enabled
- Object Lock modes for retention period
- In compliance mode, no changes are allowed for both the object and the retention period (not even root user) until retention period expires
- In governance mode, special permission can be granted to adjust retention settings
- s3:BypassGovernanceRetention + special header to remove lock
- useful for preventing accidental deletion or for testing before enabling compliance mode
- can remove lock in console
- Legal Hold
- No deletion or changes to object version until legal hold is removed, with no expiry
- s3:PutObjectLegalHold is required to add or remove lock
- useful for preventing deletion of critical objects
- can remove lock in console
S3 Access Points
- docs
- create one access point for each use case of a S3 bucket
- benefits
- simplify S3 bucket access for shared bucket
- easier management of prefixes (directories) and split up resource policies
- an access point can have its own policies and NACL
- access points have their own domain name
- Multi-Region Access Point (MRAP): allows using one access point for multiple buckets across regions
- replication and failover supported
- note that there is considerable lag between bucket