There is a reason why database software exists, and why normal flat file storage is not used to store data that is generally stored in a database. The main reason being the time required to read and write data. Databases provide high read-write performance capabilities as compared to file operations. They are broadly classified into 2 types – Relational and Non-Relational. In this post, we would discuss relational databases as a service provided by AWS.
In relational databases, the data is stored in tables. Tables have columns and every column has a data type, thus the data in that column will be stored in the same data type. For example, a table may have a column named “Serial Number” of type number, “Description” of type string, and “Date” of type date. Data is stored in the form of rows by inserting a set of values matching the corresponding columns.
Further, a relational database may contain multiple tables. To avoid repetition of information, they make use of references. A column value in a given row may refer to another row in another table. These are called database relationships, and the process of consolidating the data in this way is called normalization. The reference value is usually the primary key (foreign key for current table) of the referred table and is called the primary key for the current table.
Relational databases make use of a query language called SQL – Structured Query Language. The implementation of the language is similar across various types of Relational databases. Nowadays there are many libraries available in various languages that help formulate SQL queries automatically. To read/query the data, SELECT command is used and for creating a record INSERT command is used.
Based on the application of relational databases, they are classified into two types – OLTP and OLAP. The classification is based on how the databases are used in various scenarios. Every scenario/requirement is unique given the way it treats data when it comes to the volume and frequency of reading and writing the data.
OLTP (Online Transaction Processing) databases are suited for cases where the frequency of read-write operations is high along with high memory requirements. User-facing applications may need to have frequent and fast access to databases, thus high caching capacity for low latency is important in such scenarios.
OLAP (Online Analytic Processing) databases are suited for cases where there is a large storage demand. For example, data warehousing, where complex queries are to be executed on a huge data set. These types of databases consolidate a large number of database tables into a single table so that it becomes easier to execute complex queries. This demands high storage capacities and is thus classified as an OLAP relational database.
Amazon Relational Database Service (RDS) is a managed relational database service. In a way, it is also possible to create an EC2 instance and install a database server and client software to host a database. Doing so requires a lot of groundwork which encompasses the aspects like security, encryption, backups, recovery, monitoring, replication, etc. Amazon RDS manages all of this along with other capabilities.
To deploy an RDS instance, the very first choice that we need to make is for Database Engines. Amazon offers 6 types of database engines as below:
- MySQL – Various versions of MySQL are supported.
- MariaDB – is a drop-in binary for MySQL and is one of the most compatible with RDS.
- Oracle – is one of the widely used databases in the world. Supported editions are – SE, SE1, SE2, and EE.
- PostgreSQL – is compatible with Oracle and several versions are supported.
- Amazon Aurora – is a drop-in binary for MySQL and PostgreSQL. This provides the best performance amongst others.
- MS SQL Server – another widely used RDBMS, various versions are supported.
Some of the database engines mentioned above use GNU General Public License or are otherwise free to use. Others require a license. RDS follows 2 approaches as far as licensing is concerned – License Included and Bring Your Own License. In the case of “License Included”, the cost of the license is included in AWS costing. However, organizations wishing to use their own license can do so as well.
Determining the class of database instances is very important. It is similar to selecting the EC2 instance class, where you need to gauge what kind of performance should be supported by the database instance being provisioned. Depending on the memory requirement, network bandwidth, processing power, and disk throughput, 3 types of database classes exist as below:
- Standard – These types of database instances provide balanced performance with respect to memory, processing power, network bandwidth, and disk throughput.
- Memory-optimized – as the name suggests is optimized for faster queries, IOPS.
- Burst capable – these are smaller instances that can be used for development and testing purposes.
The storage performance of RDS databases is measured in IOPS, that is input-output operations per second. A single IO operation could either be a read or a write. Every read or write happens on a page and this page has a size associated with it. Usually, it is 16KB. Depending on the type of database engine used, it could also be 32KB. However, if it is greater than 32KB, AWS calculates it as one page per 32KB step.
If a database can perform 300 IOPS, it means it can perform 300 reads or write operations for the given page size. For example, if the requirement is to write 10MB (10240KB) of data in a second, then the performance requirement for the database is supposed to be (10240/16)=640 IOPS. This is considering the page size is 16KB. Interestingly, if the page size would have been 32KB, the performance requirement would have been 320 IOPS. Thus the two are inversely proportional.
Based on storage performance, the database storage is classified into the below 3 types. It should be noted that database instances use EBS volumes to store data. Most of the types have already been covered in the post we did for AWS EBS.
- General-Purpose SSD (gp2)
- Provisioned IOPS (io1)
- Magnetic storage
Replication is a feature provided by AWS on its RDS instance and it is one of the important considerations which also affects the pricing of RDS. Based on the urgency and criticality of data, replication mechanisms can be chosen from 2 types – read replica and multi-AZ.
Read replicas, is a way to resolve database performance issues by creating more replicas of the master database. This is also called horizontal scaling. In this case, there is a master database where all write operations are performed. Read operations are delegated to replicas in a load-balanced manner. The replicas can exist in different AZs and even in different regions. It should be considered that the replicas are kept in sync in an asynchronous manner. Up to 5 read replicas can be created and in the case of Aurora, the number goes to 15. Oracle and MS SQL Server do not support read replicas.
If you cannot take the risk of the asynchronous nature of reading replicas, instead the requirement demands High Availability of databases, you can always go for Multi-AZ deployment. Multi-AZ deployment option can opt for while creating the database or later. By doing this, database copies are deployed and kept in sync in all the AZs for failover purposes.
Since AWS RDS is a managed service, along with the above features it also provides us the functions for backup and recovery, automated snapshots, and maintenance options.