No-SQL databases are non-relational databases and they have a different approach to storing data. Having said that there are similarities and differences between SQL and No-SQL databases as well as they have their own set of pros and cons. In this post, we discuss DynamoDB which is a managed No-SQL database service by AWS.
As opposed to tables, columns, and rows in relational databases, we have tables, attributes, and items in no-SQL databases. This convention is specific to DynamoDB. For example, similar concepts in MongoDB are called collections, properties, and documents.
DynamoDB stores record as JSON objects, which are called items. Items are stored in tables. Items have attributes that are key-value pairs just like the way we have in JSON notation. Data types supported by DynamoDB are as below:
- Scalar types – these are simple types like string, number, boolean, null, and binary.
- Document types – an attribute of the item can be a nested JSON object which in turn has more attributes. Other document types are list and map.
- Set types – this is a collection of multiple values with the same data type – string, number, and binary.
While creating a table in DynamoDB we need to provide a partition key. The partition key is a way by which tables in DynamoDB are indexed. These are the primary keys and identify each item in the table with a unique value. If the partition key is the only one used to identify the item uniquely, then a query with the valid value of this partition key should return exactly one item in the result set. While inserting or creating a new item in this table, the partition key is a mandatory attribute.
Along with the partition key, we can also make use of the sort key. By doing so, DynamoDB treats the combination of the partition key and sort key together as the composite primary key of the table. In this case, more than one item in the table can have the same partition key but they should have different values in the sort key. Using the sort key makes it mandatory for it to not be empty and increases the flexibility of querying the database.
But why is it called a partition key and not a primary key? There is another reason for this. DynamoDB internally creates partitions to store data in tables. When the data grows, more partitions are created and all this happens internally and we have no way to access what happens in the backend. The partitions are created based on the value provided in the partition key. DynamoDB uses an internal hash function to create a hash which is then used to create a partition if a partition with the same hash does not exist.
DynamoDB also lets you create secondary indexes. Generally, no SQL databases are schema-less – meaning apart from partition key and sort key, keys with any name can exist. It is also possible that every item in the table will have a different set of keys apart from partition and sort keys. This is one of the fundamental differences between SQL and no SQL databases.
In situations where you would like your applications to query the database with different/additional keys, secondary indexes can be created. While creating a secondary index, a partition key and an optional sort key are specified. Secondary indexes are always created based on an already existing table. This is called a base table.
Secondary indexes are managed automatically by DynamoDB. Any object CRUDed in the base table is automatically updated in the secondary index. By default, the index tables only store the partition and sort keys of the base table apart from its partition and sort keys. Indexes are of 2 types – Global and Local. Indexes created with different partition and sort keys than that of the base table are called Global secondary indexes. Indexes with the same partition key but different sort keys are called Local secondary indexes.
All the operations which are performed on DynamoDB databases are recorded in the form of Streams. For example, when a new item is created, updated, or deleted – these details are captured in DynamoDB streams. Streams can be used to trigger Lambda functions to take appropriate action based on the matching condition.
DynamoDB provides a rich set of APIs to perform control plane operations – this is similar to database management operations where you can perform table-level CRUD operations. Similarly, the API also provides data plane operations for item-level CRUD operations. For users who are more familiar with SQL-like syntax for querying the database, DynamoDB provides a PartiQL query language that executes SQL queries in no SQL manner.
A single table in DynamoDB can be created in the same region. In the given region, replication is performed across AZs for resiliency. DynamoDB supports eventually consistent and strongly consistent reads. In an eventually consistent read, when an update takes place on the items in the table, reading it immediately may not reflect the updated value. It can take a few seconds for the same. However, in a strongly consistent read, the updated values are made available immediately.
Calculation of capacity for DynamoDB database is calculated using Read Request Units (RRU) and Write Request Units (WRU). With an item size of 4 KB, one RRU represents 1 strongly consistent read and 2 eventually consistent reads. With an item size of 1 KB, one WRU represents 1 write operation. However, for performing a transactional write, you would need 2 WRUs. Transactional writes are the ones where a sequence of write operations matters.
Based on RRU and WRU consumed in a given time frame, the capacity consumption rate is calculated as Read Capacity Units (RCU) and Write Capacity Units (WCU). There are 2 ways to provision capacity units to DynamoDB databases.
- On-Demand – in this mode, you let the operations happen as and when required. There is no threshold to maintain. Thus during a low consumption rate, fewer units are allotted, and when the operations scale up, more units are provisioned automatically. The billing is done on a pay-as-you-use basis.
- Provisioned – this is a default mode of operation. Here you pre-define how many capacity units (RCU and WCU) can be used at most. This definition is done based on the estimates and experience of application performance.
AWS provides NoSQL workbench for data modeling, visualization, and building purposes. It is similar to a database client that connects with DynamoDB, and lets developers build and maintain databases.
To ensure the security of our DynamoDB databases, the data being stored is fully encrypted when at rest. Keys managed in KMS are used for encryption. This enables us to use AWS-owned or customer-managed keys for encryption. For data in transit, we can make use of VPN or AWS Direct Connect if the data leaves the VPC to serve requests from authorized applications on the internet. Within the region, the data in transit can be protected using various measures like – VPC security groups, NACLs, endpoints, and IAM policies.
We can establish a monitoring baseline in terms of several read/write operations to monitor capacity, any threshold breaches, errors, events, etc. AWS CloudWatch can be used to log events and trigger alarms for given conditions.