Learn

DynamoDB Replication: The Ultimate Guide

August 23, 2023

Topics

In this article, we will deep dive to have a clear understanding of data replication, DynamoDB, methods by which DynamoDB data replication can be achieved and various technical aspects will be discussed so that you can have a better understanding which type of method would be suitable for you.

DynamoDB is a core service provided by Amazon Web Services (AWS). DynamoDB replication provides automatic and smooth data replication across several Amazon DynamoDB tables and multiple regions around the globe. This art of replication technique came into the picture when the need to improve fault tolerance, disaster recovery, and data availability even in case of failures arose. Businesses can achieve low-latency for their applications as well as data durability and accessibility even in case of various failure scenarios by replicating data across several locations.

Organizations can customize their data replication strategies to meet their unique needs and objectives. DynamoDB replication is a powerful candidate for this scenario, which provides choices for the replication of both cross-region and same-region. DynamoDB replication plays a critical role in maintaining high availability, reducing latency, or establishing trustworthy data redundancy in order to create durable and responsive applications within the AWS environment.

What is DynamoDB?

Amazon Web Services (AWS) provides one of the best fully managed NoSQL database services known as Amazon DynamoDB. It is designed for scaling applications across various regions that require high availability, and low-latency performance while managing structured or semi-structured data. DynamoDB works on both a flexible as well as schema-less data model, making it suitable for a variety of scenarios, from mobile applications to high load web applications.

What is Data Replication?

Data replication is the process of creating and maintaining copies of data in multiple locations, systems, or storage environments. The primary goal of data replication is to ensure data availability, enhance data reliability, and provide various backups for smooth operation of business needs such as disaster recovery, high availability, distributing load, and load balancing.

Key points about data replication:

Redundant data backups: Replicating data across multiple locations or systems creates redundant data. Replicated data can be stored in geographically separate locations to provide an effective disaster recovery strategy. If a major disaster affects one site, data from another site can be used to restore operations. In case of any kind of failures, data corruption, or other unforeseen circumstances, the replicated data can be used as a backup to restore services quickly and minimize downtime.
High Availability: By placing replicated databases closer to users in various regions, data access times can be reduced, enhancing the user experience and ensuring readily available data. Replication helps ensure that data and services remain accessible even if a primary system or location experiences a failure. Users can be seamlessly redirected to a replicated database and help in reducing the outages.
Load Distribution: Replicated data can be used to distribute workloads across multiple systems or servers, improving performance and responsiveness by balancing the processing load. In distributed systems, data replication allows users in different locations to access the same data without introducing significant latency due to remote access.
Synchronization: Data replication involves ensuring that all copies of data are consistent and up to date. Changes made to the original data are propagated to the replicas based on predefined rules. Different replication strategies prioritize consistency, availability, and partition tolerance differently. Consistency models like strong consistency, eventual consistency, and causal consistency define how quickly changes are propagated and how conflicts are resolved.

Data replication is essential and widely used in various contexts, including database replication (e.g. Amazon DynamoDB Global Tables), system files replication, creating and maintaining content distribution networks, real-time analytics, and many more.

While data replication offers many benefits, there are few challenges which come with it such as maintaining synchronization in distributed systems, resolving conflicts in data updates, managing network and bandwidth considerations, and ensuring security and access controls across replicas.

Overall, data replication is an important technique for ensuring data availability, reliability, and top notch performance in modern distributed and highly available applications. It plays an important role in maintaining the integrity of data and supporting a variety of business and operational requirements.

What is the need of DynamoDB Replication?

Let’s start by understanding what is the need of DynamoDB replication with a simple example. Let’s assume that the application is hosted from San Francisco in the United States which is east of the country. Now there are users who have started using the same application from the west side of the country and now the number of users have increased around the globe also. Most likely the users from the rest of the world would be facing a latency while using the application. Here’s the deal scenario where DynamoDB replication architecture would be leveraged. The replicas of the application will be hosted in multiple regions around the world which will help to achieve negligible latency. This is the basic foundation of multi-region architecture. Apart from decreased latency, there are few other benefits as well:

Geographically distributed load balancing helps in maintaining a structured customer base.
Even if any one of the replicated databases goes down, other replicas can be used for data recovery, distributing load and ensuring smooth business continuity.

Technically the architecture also makes it feasible for carrying out the DynamoDB replication:

Reliability and performance are the core pillars of DynamoDB's architecture. It automatically replicates data across multiple regions based on availability of zones to ensure data durability and fault tolerance. This allows applications to continue functioning even if there is any kind of failure in any of the available zones.
DynamoDB provides auto scaling of the servers, dynamically maintaining capacity and distribution of load based on demand. Users can configure and grant the desired read and write accesses to capacity units, and DynamoDB handles the rest. It very efficiently adjusts the capacity to accommodate traffic fluctuations. This flexibility eliminates the need for manual provisioning and ensures cost-effectiveness.
To support complex querying, DynamoDB provides secondary indexes that enable efficient access to data based on attributes beyond the primary key. Global Secondary Indexes (GSI) allow queries across the entire table, while Local Secondary Indexes (LSI) operate on subsets of the data.
DynamoDB offers fast and predictable performance for simple key-value searches when it comes to querying. However, its querying capabilities are limited as compared to traditional relational databases. Developers often perform filtering and data manipulation in their application’s source code after retrieving results for better efficiency.

Methods to setup DynamoDB Replication

Setting up replication for Amazon DynamoDB involves implementing strategies to duplicate and synchronize data across multiple regions, ensuring redundancy, availability, and improved performance. Here are several methods to achieve DynamoDB replication:

Method 1 : DynamoDB Replication using Global Tables

As discussed in the above section, DynamoDB replication is essential for distributed load and smooth user experience. To carry out DynamoDB replication Global Tables are used. Global Tables are nothing but the replicas of the application in different regions. Let’s understand the steps to carry out DynamoDB replication using Global Tables.

Step 1: Create Global Tables

Navigate to the AWS management console.
Now navigate to the DynamoDB console.
In the top menu of the console, you will find a “Create table” option, click on it. Alternatively you can also select an existing table.
On creating the table, you can design the schema for your primary table like table name, keys etc.
Select "Add region" under "Global Tables" and the secondary region where the replica table should be created.
Set the secondary region preferences.

Step 2: Configuring replication

Once the Global tables are created and configurations are done, Global tables automatically handles DynamoDB replication and conflict resolution across the selected regions. In order to resolve the conflict resolution the Global tables use last-writer wins resolution mechanism where the most recent updated item is considered and changes are reflected globally.
Now, the data can be accessed from any region. The region is elected automatically based on the user’s location.

Step 3: Monitoring

To make sure that your architecture can handle changes in the primary and secondary regions with grace, test the behavior of your application using failover scenarios.
Global Tables automatically promotes the secondary area to become the primary region if the first region becomes unavailable.
When the newly promoted primary region comes back up after failover, updates performed in the secondary region are synchronized.
AWS CloudWatch metrics and alerts can be used to keep an eye on the functionality and health of your global tables.

Why DynamoDB replication using the Global tables method might not be feasible?

Due to potential network latency, DynamoDB Global Tables is not intended to enable real-time synchronization, but rather high availability and disaster recovery.
It's crucial to understand that an application must be designed with the model having eventual consistency.
There might be a need of having a person who can select AWS regions based on your needs from time to time because each location has different pricing and latency characteristics.
If you wanna replicate your data to some other cloud data warehouse like MongoDB then the process can become very tedious and increase your engineering bandwidth.

In such scenarios you can always go for a low code data replication tool like Fivetran.

Method 2 : DynamoDB Replication Using Fivetran

The DynamoDB replication process is made easier by Fivetran, a powerful cloud-based data integration tool. Fivetran helps organizations to seamlessly migrate and sync data, guaranteeing that it is available for analysis. It does this with a straightforward interface and automated procedures. This technique makes it possible for both technical and non-technical users to use it since it does not require human scripting or intricate data conversions.

Data extraction is made simple by Fivetran's pre-built connectors for a variety of data sources, including DynamoDB. It effectively manages incremental updates, data format conversions, and schema changes while preserving the quality and integrity of the data during the transfer process. Furthermore, it has strong data transformation features that let users perform personalized changes on data before importing it into the data warehouse.

By adopting Fivetran for DynamoDB replication, businesses may save time and money while concentrating more on data analysis and decision-making than on the challenges of data integration. Fivetran gives companies the tools they need to unleash the full potential of their connection combinations, generate insightful data, and implement data-driven initiatives by using an automated and dependable data loading process.

The steps to carry out the DynamoDB replication will be very simple by leveraging Fivetran:

Step 1: Find External ID

1. Go to the connector setup form.

2. Find the automatically-generated External ID.

3. Make a note of the External ID.

4. This ID is needed for AWS configuration with Fivetran.

Step 2: Create IAM Policy

‍

1. Open the Create new AWS IAM policy page.

2. Go to the JSON tab.

3. Copy the following policy and paste it in the JSON editor:

```json

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"dynamodb:DescribeStream",

"dynamodb:DescribeTable",

"dynamodb:GetRecords",

"dynamodb:GetShardIterator",

"dynamodb:ListTables",

"dynamodb:Scan"

"Resource": "*"

}

]

}

```

4. Modify the policy if needed to restrict access to specific tables. If using a customer-managed KMS key, add necessary actions to the Action section.

6. Proceed to the "Next: Tags" step if you want to add tags.

7. Review the policy, give it a name (e.g., "Fivetran-Dynamo-Access"), and add an optional description.

8. Click "Create policy."

Step 3: Create IAM Role

1. Open the Create new AWS IAM role page.

2. Select "AWS account" and enter Fivetran's AWS VPC Account ID (834469178297) in the Account ID field.

3. Select "Require external ID" and input the External ID from Step 1.

4. Choose the IAM policy created in Step 2 in the "Add permissions" page.

5. Name the role (e.g., "Fivetran-Dynamo") and create the role.

6. Open the role you just created and copy the "Role ARN."

7. Paste the Role ARN in the connector setup form.

Step 4: Enable Streams for DynamoDB Tables

1. Open the DynamoDB service in your AWS console and select "Tables" and choose a table.

3. Go to the "Exports and streams" tab.

4. In "DynamoDB stream details," click "Turn on."

5. Select both new and old images for the item.

6. Confirm the stream activation.

Step 5: Configure AWS PrivateLink (Optional - Business Critical Plan)

1. Follow the AWS PrivateLink setup guide to configure PrivateLink for your database.

Step 6: Finish Fivetran Configuration

‍

1. Enter the destination schema name in the connector setup form.

2. Select your “AWS region.”

3. Choose your pack mode.

4. Optionally, enable PrivateLink if desired.

5. Click "Save & Test" to let Fivetran sync data from your DynamoDB account.

You only need to authenticate the DynamoDB source one time which will hardly take a few minutes. You can now replicate your data anywhere and any number of times whether it be any other AWS service, cloud database or data warehouse. For more details you can go through this detailed DynamoDB set up guide.

Advantages of Using Fivetran for DynamoDB replication

A few key advantages of using Fivetran for DynamoDB replication are as follows:

Seamless Data Integration: Fivetran provides pre-built connectors for various data sources, including Amazon DynamoDB and data warehouses including Amazon Redshift, eliminating the need for manual scripting or complex configurations. This expedites and streamlines the data integration process.
Automated Workflows: Workflows that are automated help to reliably and often sync data by automating the data loading procedure. It manages incremental updates, data format changes, and schema changes, minimizing manual intervention and preserving data integrity.
Data Transformation Capabilities: It has strong data transformation capabilities that let users do individualized data transformations before putting the data into the data warehouse. This enables data cleansing, normalization, and enrichment, ensuring that the data is ready for analysis.
Monitoring and Alerting: To follow the progress of the data integration process, it offers monitoring and alerting options. It provides visibility into data loading metrics, error handling, and notifications for any problems that arise.
Data Source Flexibility: Fivetran supports a wide variety of data sources. It can connect to various databases, cloud services, and applications, enabling organizations to integrate data from diverse sources into data warehouses such as Amazon DynamoDB.
Time and Resource Savings: Fivetran saves time and money by automating the data loading process and doing away with the need for manual intervention. Teams are able to concentrate on data analysis and drawing conclusions from the loaded data thanks to this.

Conclusion

For applications requiring high availability, scalability, and low latency, Amazon DynamoDB is a fully managed NoSQL database service. Auto replication, flexible data modeling, and support for a range of situations, such as region-based distributed load and dynamic scaling from small-scale to large-scale applications, are all features of DynamoDb replication using Global tables.

It's pretty much clear that carrying out the DynamoDB replication using Global tables has its own limitations. It would require a significant bandwidth and man hours to carry out replication in case the replication needs to be done in some other cloud data warehouse. This is where you can leverage a near real-time low-code tool, Fivetran. There is no need for any coding because it can automate your replication task. You can also visit Connector Directory | Fivetran, to explore various other connectors supported by Fivetran.

‍