Introduction & Best Practices to DynamoDB Relational Modeling Simplified 101

By: Published: May 20, 2022

DynamoDB Relational Modeling FI

DynamoDB is an Amazon Web Services database system that enables key-valued cloud services and data structures. It provides users with auto-scaling, in-memory caching, backup, and restoration options for all of their DynamoDB-based internet-scale applications.

In this article, you will learn about the DynamoDB Relational Modeling process and the steps with examples. 

Table of Contents

What is Amazon DynamoDB?

DynamoDB Relational Modeling: dynamodb
Image Source

Amazon DynamoDB is a fully managed NoSQL database service with seamless scaling and quick and predictable performance. You don’t have to worry about hardware provisioning, setup, configuration, replication, software patching, or cluster scaling with Amazon DynamoDB because it offloads the administrative costs of running and growing a distributed database. Amazon DynamoDB also supports encryption at rest, which reduces the time and effort required to safeguard sensitive data.

You can design database tables using DynamoDB to store and retrieve any amount of data and handle any request traffic. You can increase or decrease the throughput capacity of your tables without experiencing any downtime or performance reduction. The AWS Management Console can be used to keep track of resource usage and performance data.

Key Features of Amazon DynamoDB

  • Scalability: When faced with an increase in users, traffic, or data, today’s web-based applications frequently face database scalability issues. AWS offers DynamoDB, a fully managed NoSQL database solution built for fast performance at any scale, to address database scalability challenges. Developers who wish to build scalable cloud-based applications can use Amazon DynamoDB to start with a limited capacity and then scale up the request capacity of certain tables as their service increases in popularity.
  • Pricing: Because the pricing plan for AWS DynamoDB is based on read and write units per second, DevOps can configure the read and write units in seconds. DynamoDB can be supplied with a certain number of Write units and a certain number of Read units.
  • Data Model and Indexing: All tables in AWS DynamoDB are collections of things. Each object is made up of a set of characteristics. One property should serve as the main key for each table. Scalar and multi-valued data are the two forms of data. A string, a number, or a byte can be used as a scalar data type. A string set, number set, or binary set is a multi-valued set. The main key is used to index each table.

What is Relational Modeling?

E.F. Codd proposed the Relational Model to model data in the form of relations or tables. After creating the database conceptual model using an ER diagram, you must transform it into a relational model that can be implemented using any RDBMS language such as Oracle SQL or MySQL.

Data is stored in Relational Databases using the Relational Model. A relational database stores data as relations (tables).

The relational model’s purpose is to provide a declarative method for specifying data and queries: users state directly what information the database contains and what information they want from it, and the database management system software handles describing data structures for storing the data and retrieval procedures for answering queries.

Simplify DynamoDB’s ETL Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Aggregation solution, can help you automate, simplify & enrich your aggregation process in a few clicks. With Hevo’s out-of-the-box connectors and blazing-fast Data Pipelines, you can extract & aggregate data from 100+ Data Sources straight into your Data Warehouse, Database, or any destination like DynamoDB. To further streamline and prepare your data for analysis, you can process and enrich Raw Granular Data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!”

GET STARTED WITH HEVO FOR FREE

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

DynamoDB Relational Modeling Process

Data is stored in a normalized relational structure in traditional relational database management systems (RDBMS). Hierarchical data structures are reduced to a set of common elements that are stored across several tables using this structure. The schema below is an example of a general order-entry application with an accompanying HR schema that supports a hypothetical manufacturer’s operational and business support systems in DynamoDB Relational Modeling.

DynamoDB Relational Modeling: schema
Image Source

To facilitate application-layer access patterns, RDBMS platforms build or materialize views of normalized data using an ad hoc query language (usually a flavor of SQL).

For example, you could run the following query against the preceding schema to get a list of purchase order items sorted by quantity in stock at all warehouses that can ship each item.

SELECT * FROM Orders
  INNER JOIN Order_Items ON Orders.Order_ID = Order_Items.Order_ID
  INNER JOIN Products ON Products.Product_ID = Order_Items.Product_ID
  INNER JOIN Inventories ON Products.Product_ID = Inventories.Product_ID
  ORDER BY Quantity_on_Hand DESC

These kinds of one-time queries give a versatile API for obtaining data, but they take a long time to process. Querying data from many sources is common, and the results must be prepared for presentation.in DynamoDB Relational Modeling The preceding query runs sophisticated queries across multiple tables before sorting and integrating the results.

The necessity to implement an ACID-compliant transaction structure is another aspect that can slow down RDBMS systems. When stored in an RDBMS, most online transaction processing (OLTP) applications’ hierarchical data structures must be broken down and dispersed across numerous logical tables. As a result, an ACID-compliant transaction architecture is required to avoid race circumstances that may arise if an application attempts to read an object that is currently being written. A transaction architecture like this has to add a lot of overhead to the writing operation.

Traditional RDBMS platforms have these two challenges when it comes to scaling. It’s still unclear whether the new SQL group will be able to produce a distributed RDBMS solution. Even so, it’s unlikely that this would fix the two problems mentioned before. The processing costs of normalization and ACID transactions must remain considerable regardless of how the service is delivered.

As a result, when your organization demands low-latency responses to high-volume requests, using a NoSQL system makes both technical and financial sense. Amazon DynamoDB avoids the difficulties that restrict the scalability of relational systems.

Why Use DynamoDB Instead of Relational Database?

For the following reasons, a relational database system does not scale well:

  • It normalizes data and stores it in numerous tables that must be written to disc using multiple queries.
  • It has the same performance costs as an ACID-compliant transaction system.
  • It reassembles required views of query results via expensive joins.

For the following reasons, DynamoDB Relational Modeling scales well:

  • DynamoDB Relational Modeling’s schema flexibility allows it to store hierarchical data in a single item.
  • Its composite key design allows it to keep similar things on the same table together.

Queries against the data store in DynamoDB Relational Modeling become considerably more straightforward, often taking the form:

SELECT * FROM Table_X WHERE Attribute_Y = "somevalue"

When compared to the RDBMS in the previous example, DynamoDB Relational Modeling does considerably less work to return the requested data.

What makes Hevo’s ETL Process Best-In-Class

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

Getting Started with DynamoDB Relational Modeling

To begin creating a DynamoDB relational modeling table that will scale efficiently, you must first determine the access patterns required by the Operations and Business Support Systems (OSS/BSS) that it will support:

  • Review user stories about activities and objectives for new applications. 
  • Keep track of the different use cases you find and the access patterns they require.
  • Analyze query logs for existing apps to see how people are currently utilizing the system and what the important access patterns are.

After you’ve gone through this process in DynamoDB Relational Modeling, you should get a list that looks something like this.

DynamoDB Relational Modeling: access patterns
Image Source

Your list might be much larger in an actual job application. This collection, on the other hand, shows the spectrum of query pattern complexity that you could see in a production setting.

The use of denormalization and composite key aggregation to reduce query complexity is a frequent DynamoDB Relational Modeling schema design strategy.

This includes employing composite sort keys, overloaded global secondary indexes, partitioned tables/indexes, and other design patterns in DynamoDB. These elements can be used to organize data so that an application can retrieve everything it requires for a specific access pattern with a single query on a table or index. The adjacency list pattern is the most common pattern for modeling the normalized schema depicted in DynamoDB Relational Modeling. Global secondary index writes sharding, global secondary index overloading, composite keys, and materialized aggregations are some of the other techniques that could be employed in this design.

Example of DynamoDB Relational Modeling

This example shows how to use Amazon DynamoDB to model relational data. The relational order entry schema given in Relational Modeling correlates to a DynamoDB database design. It uses the Adjacency List Design Pattern, which is a standard technique for DynamoDB Relational Modeling to describe relational data structures.

The design pattern asks you to define a set of entity types that correspond to the DynamoDB relational modeling schema’s various tables. A compound (partition and sort) primary key is then used to add entity entries to the table. The partition key of these entity objects is the property that uniquely identifies the item, and it is abbreviated as PK for all items. An attribute value for an inverted index or global secondary index can be found in the sort key attribute. It’s commonly referred to as SK.

The relational order entry model is supported by the entities listed below.

  • HR-Employee: PK: EmployeeID, SK: Employee Name
  • HR-Region: PK: RegionID, SK: Region Name
  • HR-Country: PK: CountryId, SK: Country Name
  • HR-Location: PK: LocationID, SK: Country Name
  • HR-Job – PK: JobID, SK: Job Title
  • HR-Department: PK: DepartmentID, SK: DepartmentID
  • OE-Customer: PK: CustomerID, SK: AccountRepID
  • OE-Order: PK OrderID, SK: CustomerID
  • OE-Product: PK: ProductID, SK: Product Name
  • OE-Warehouse: PK: WarehouseID, SK: Region Name

You can describe the relationships between these entity items by adding edge items to the entity item partitions after you’ve added them to the database. This stage of DynamoDB Relational Modeling is illustrated in the table below:

The Employee, Order, and Product Entity partitions on the table in this example feature additional edge items that hold pointers to other table entity items. Next step in DynamoDB Relational Modeling is to create a few Global Secondary Indexes (GSIs) to handle all of the previously established access patterns. The primary key and sort key attributes of the entity objects do not all use the same type of data. The primary key and sort key characteristics must both be present before the table can be inserted.

Because some of these entities utilize proper names as sort key values and others use other entity IDs, the same global secondary index can handle several sorts of queries in DynamoDB Relational Modeling. GSI overloading is the name for this approach. For tables with various item kinds, it effectively removes the default limit of 20 global secondary indexes. This is labeled GSI 1 in the diagram below.

GSI 2 is built to handle a typical application access pattern: getting all the elements on the table with a particular state. Unless the items are split across more than one logical partition that can be searched in parallel, this access pattern can result in a hotkey for a big database with an unequal distribution of items across available states in DynamoDB Relational Modeling. Write sharding is the name of this design pattern.

The application adds the GSI 2 primary key attribute to every Order item to accomplish this for GSI 2. Unless there is a specific reason to do otherwise, it populates it with a random number in the range of 0–N, where N can be found using the following formula:

ItemsPerRCU = 4KB / AvgItemSize

PartitionMaxReadRate = 3K * ItemsPerRCU

N = MaxRequiredIO / PartitionMaxReadRate

Suppose you’re anticipating the following:

  • There will be up to 2 million orders in the system, which will expand to 3 million in five years.
  • At any given time, up to 20% of these orders will be in an OPEN condition.
  • The average order record is roughly 100 bytes, and the order partition has three OrderItem records that are around 50 bytes each, for a total order entity size of 250 bytes.

The N factor computation for that table would look like this.

ItemsPerRCU = 4KB / 250B = 16

PartitionMaxReadRate = 3K * 16 = 48K

N = (0.2 * 3M) / 48K = 13

In this situation, you must spread all orders across at least 13 logical partitions on GSI 2 to avoid a hot partition on the physical storage layer when reading all Order items with an OPEN status. To account for irregularities in the dataset, padding this amount is a smart idea. So a model with N = 15 should be fine. As previously stated, this is accomplished by assigning a random 0–N value to the GSI 2 PK attribute of each Order and OrderItem record added to the table in DynamoDB Relational Modeling process.

This breakdown implies that the access pattern that involves aggregating all OPEN invoices occurs infrequently, allowing you to satisfy the request with burst capacity. You can use a State and Date Range Sort Key condition to get a subset or all Orders in a certain state from the following global secondary index.

DynamoDB Relational Modeling: secondary index
Image Source

The objects are spread randomly across the 15 logical partitions in this case of DynamoDB Relational Modeling. This structure works because the access pattern necessitates retrieving a huge number of elements. As a result, none of the 15 threads are likely to return empty result sets, which could indicate wasted capacity. Even if no data is returned or written, a query always utilizes one Read Capacity Unit (RCU) or one Write Capacity Unit (WCU).

Finally, you can return to the previously established access patterns of DynamoDB Relational Modeling. The following is a list of access patterns and the query conditions that will be used to handle them in the new DynamoDB Relational Modeling version of the application.

DynamoDB Relational Modeling: list
Image Source

Conclusion

In this article, you learned about relational modeling, DynamoDB and its salient features, and DynamoDB Relational Modeling Process with examples. 

However, as a Developer, extracting complex data from a diverse set of data sources like Databases, CRMs, Project management Tools, Streaming Services, and Marketing Platforms to your DynamoDB Database can seem to be quite challenging. If you are from non-technical background or are new in the game of data warehouse and analytics, Hevo Data can help!

Visit our Website to Explore Hevo

Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

mm
Former Content Writer, Hevo Data

Sharon is a data science enthusiast with a passion for data, software architecture, and writing technical content. She has experience writing articles on diverse topics related to data integration and infrastructure.

No-Code Data Pipeline for DynamoDB