AWS

AWS S3

Category: Storage

Amazon provides object storage service in the form of S3 – Simple Storage Service. This service is used to store data in the form of objects or files. It is very easy to use S3, but at the same time looking at the number of configuration options provided by Amazon it can get a bit overwhelming.

Amazon S3 provides a web interface using which we can manage our files stored in it. Below are some of the features of using S3.

  1. Simple to use
  2. Storage management
  3. Storage monitoring
  4. Access control
  5. Integrates seamlessly with other AWS services

AWS S3 makes use of buckets to store objects. Buckets are created in a specific region. For the least latency, it makes sense to select a region that is closer to the users. While creating a new bucket, you need to select the name of the bucket which is unique globally. AWS also gives you the option to enable bucket-versioning, add tags, enable encryption for objects stored in this bucket, and a setting to enable object lock.

By default, object versioning is enabled for all the objects added to the buckets. This helps us analyze and restore an older version whenever necessary. It also helps keep track of the changes made to a particular object. Along with objects, a metadata file is also created automatically. This metadata file holds the information about the object in key-value pairs and is no more than 2 KB in size.

Storage classes that can be used to configure the way an object is stored in S3 are as below:

  1. Standard – use for frequently accessed data
  2. Standard-IA (Infrequently Accessed) – For data that reside for a longer time and are not accessed frequently.
  3. One Zone-IA – For the data which resides for a longer time, is not frequently accessed, and is not critical. The copy of data is stored in a single availability zone.
  4. Reduced redundancy – For frequently accessed but non-critical data.
  5. Intelligent-Tiering – When access patterns are not predictable.
  6. Glacier – Very infrequently accessed and long-term storage. Retrieval of data from Glacier takes minutes to hours.
  7. Glacier Deep Archive – Very infrequently accessed and long-term storage. Retrieval of data takes 12 hours.

Dealing with buckets and objects needs identifiers. AWS provides identifiers in the form of ARN (Amazon Resource Name) for the bucket. Objects within the bucket can be identified by S3 URI, ARN, Entity Tag (Etag), and Object URL (HTTP). All these identifiers serve various use cases based on the type of access to be requested.

Like all other services in AWS, S3 also supports tagging. Tags can be used to identify your resource using key-value pairs. This is especially useful when you have to run cost reports for a specific section of resources. For example, if we create tags to store a project or department information for all our S3 buckets, project-based reports can be created by selecting appropriate key-values provided in tags.

S3 supports encryption of the objects stored. You can enable encryption at the bucket level which is applied to all the objects stored in the bucket, but you can have a specific choice per object. AWS S3 uses server-side default encryption using Amazon managed keys. Encryption can also be done using the customer master key (CMK) by storing the CMK in AWS KMS (Key Management Service).

It is not every day that users will use every object stored in S3. Over a period of time objects become stale and are not much of use. However, deleting them is not always a feasible solution especially when you have to preserve the artifacts for audit purposes.

AWS S3 offers Intelligent-Tiering Archive configuration capabilities which can be used to define the storage class used for redundant data and in turn affects the accessibility of the objects. You can define rules based on the time since the object is touched. After a certain period of time, you may choose to store the objects in the bucket in the Archive Access tier or Deep Archive Access tier. By putting such configurations in place for infrequently accessed data you also save up on cost by storing them in cheaper tiers.

As far as monitoring is concerned, there are a couple of options that can be used. S3 by itself provides a server access logging function that keeps track of all the access requests made for the particular bucket. S3 uses one of the buckets to store these logs. When you select a bucket to store the logs, S3 automatically modifies the access policies of the group so that it is able to execute write privileges on the target bucket.

It is often desirable to know when an object is modified and by whom. S3 buckets can be configured to trigger notifications to SNS, Lambda functions, and SQS queues. The event notification thus triggered can be used to trigger appropriate planned action in the future. This service trigger event notifications for the events as below:

  1. Creation
  2. Deletion
  3. Restoration
  4. Object loss (for reduced redundancy storage)
  5. Replication

By default, the contents of any S3 bucket are not public. The permissions tab, provides a way to mark any bucket as public. This may not sound very useful for all the use cases, but can be a great help when you would want to host a static website on S3. S3 enables users to host their static web content. 

To host a website, once the bucket is marked as public, there are additional configurations that are needed to make this happen. You can create an access point for your bucket and use the same to enable static website hosting on your bucket. This also needs you to copy all the website pages and content to be uploaded in the bucket. Once uploaded, specify the index page and error page before saving the configuration and you should be good to go.

Talking about public access, access controls can also be configured on S3 buckets based on who the permissions are granted to and what kind of permissions. Generally, there are 4 types of users who would access S3 buckets – the owner of the bucket, anyone on the internet (public access), users with AWS accounts, and programmatic access for log delivery groups.

For each of the above groups you can set the read, write, and list permissions on object or bucket level. Apart from these, it is also possible to write bucket policies written in JSON to define fine-grained access to the objects.

Unlike other file storage services, the pricing of AWS S3 does not depend on bandwidth utilization i.e. amount of data uploaded and downloaded along with limits on total data to be stored. The pricing model of S3 depends on the way you store the data. There are no limits on the amount of data that can be stored. 

Based on the storage class being used, the data is charged per month in segments. For example, the charge for storing data in S3 Standard storage is $0.0023 per GB for the first 50TB data, after which it is $0.0022 for the next 450 TB data, and so on. The cheapest storage class is S3 Glacier Deep Archive which is meant for long-term storage.

However, these are only storage costs. There are additional costs if we choose to use additional services like monitoring, encryption, transfer acceleration, retrieval, replication, etc.

Categories:AWS, Storage

Tagged as: , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s