Learn

MongoDB Database Replication

September 5, 2023

Topics

This article explores the core ideas of MongoDB replication, as well as its advantages and underlying processes, which make it a vital resource for creating dependable and adaptable database systems. You will get to know about various methods for executing the MongoDB replication.

Ensuring data availability and reliability is crucial in the dynamic world of centralized database management. This issue is addressed by MongoDB, a flexible NoSQL database solution because of its strong replication system. The whole package of preserving data integrity, high availability, and effective scaling is MongoDB replication. This solution not only protects against hardware failures and data loss but also improves read scalability and facilitates geographically distributed applications by spreading and synchronizing data across multiple nodes.

What is MongoDB

MongoDB is a widely used open-source, NoSQL database management system that falls under the category of document-oriented databases. Unlike traditional relational databases, MongoDB uses a flexible, schema-less data model that is designed to handle large amounts of unstructured or semi-structured data. It is developed by MongoDB, Inc. and has gained widespread recognition in various industries and applications.

Key features of MongoDB

Document-Oriented: MongoDB stores data in documents, which are JSON-like format composed of key-value pairs. Documents can have different fields and structures, providing flexibility to accommodate different data formats within a single collection which is a table.
Collections and Documents: In MongoDB, documents are organized into collections, which can be considered as equivalent to tables in the relational databases. Each document within a collection can have a different schema which makes it easier to handle the dynamic data.
No Fixed Schema: Unlike relational databases, MongoDB does not require a fixed schema for its collections. This means fields can be added, modified, or removed without making any difference to the other documents within the same collection.
Scalability: MongoDB supports horizontal scaling by allowing data to be distributed across multiple servers or nodes. This is crucial for handling large amounts of data and high workloads.
Replication: MongoDB offers built-in replication through Replica Sets, enabling the creation of multiple copies of data for high availability and fault tolerance.
Sharding: Sharding is a method of partitioning data across multiple virtual machines. MongoDB's sharding feature allows databases to be distributed and balanced across different servers, which can significantly improve query performance and handle large data volumes.
Query Language: MongoDB provides a rich query language for retrieving data from collections. Queries can be written to match the hierarchical nature of documents and support various filtering, sorting, and aggregation operations.
Indexes: Indexes in MongoDB improve query performance by allowing efficient data retrieval. Single-field, compound, geospatial, and text indexes are few types of indexes that MongoDB supports.
Aggregation Framework: The Aggregation Framework offers powerful data transformation and aggregation capabilities, allowing the user to perform complex queries and operations on data within MongoDB.
Ad Hoc Queries: MongoDB supports ad hoc queries which is very useful as you can query data without the need to predefine relationships or join tables.

MongoDB is widely used in various scenarios, including web applications, mobile applications, real-time analytics, content management systems, and more. It's a one stop solution for multiple database projects that require the flexibility to handle dynamically changing data structures, high scalability, and efficient data retrieval. However, it's important to choose the right database system based on the specific requirements of your application.

What is MongoDB replication and how it works

MongoDB replication is a data synchronization process that allows multiple copies of MongoDB data to be maintained across different servers or nodes. This feature is designed to improve data availability, fault tolerance, and scalability in distributed database environments. MongoDB replication is implemented using a structure called a replica set.

In a replica set, there are two or more MongoDB instances. One of them would be a primary and one or more would be the secondaries. The primary node is the primary source of truth for data modifications, handling all write operations from the users. Secondary nodes replicate data from the primary node, ensuring that they have a consistent copy of the primary data.

The replication process works as follows:

Write operations on the primary: When a user sends a write operation (such as an insert, update, or delete) to the primary node, the primary node processes the operation and records it in its oplog (operations log).
Oplog replication to secondaries: Secondary nodes poll the primary's oplog at regular intervals. The oplog contains a chronological record of all the write operations performed. The secondary nodes read the oplog entries and apply the same operations to their data sets in the same order they were executed on the primary node.
Achieving data consistency: Through this oplog-based replication, secondary nodes catch up with the primary's node data over time. This process ensures that the data on secondary nodes remains consistent with the primary's node data.
Read operations: While primary nodes handle write operations, both primary and secondary nodes can serve read operations which can help in load balancing. Clients can choose to read from secondary nodes, which helps distribute the read load balance and reduce the primary node's workload. However, note that secondary node might have slightly outdated data due to replication lag.

MongoDB replication provides several benefits

High Availability: In the event of primary node failure, a secondary node can be automatically promoted to the primary role, ensuring that the database remains operational and minimizing downtime.
Fault Tolerance: Multiple replicas of data reduce the risk of data loss due to hardware failures or other issues affecting a single node.
Read Scalability: Secondary nodes can handle read queries, distributing the read workload and improving overall performance.
Data Redundancy: Having multiple replicas of data provides a level of data redundancy, helping protect against data loss.

To set up and manage replication, administrators can define a Replica Set by specifying the nodes and their roles in a configuration. MongoDB's replication mechanism ensures data consistency, handles failover, and provides the tools necessary to monitor the status of the Replica Set.

In summary, MongoDB replication is a critical feature that enhances data availability and reliability in distributed environments. It enables the maintenance of synchronized data copies across multiple nodes, allowing for fault tolerance and improved performance in MongoDB database systems.

Methods to setup MongoDB replication

Setting up MongoDB replication is a crucial step in creating a fault-tolerant and highly available database environment. MongoDB replication allows you to create multiple copies of your data across different servers, ensuring data redundancy and fault tolerance. Here are the methods to set up MongoDB replication

Method 1 : MongoDB replication using replica set

Setting up MongoDB replication using a Replica Set involves several steps. Here are detailed instructions with code snippets for each step:

Step 1: Prepare MongoDB Instances

Install MongoDB on multiple servers or virtual machines where Replica Set will be created. You can follow the installation instructions provided in the MongoDB documentation.

Step 2: Configure Network Settings

Ensure that all the servers can communicate with each other over the network. Include the hostnames and IP addresses of all Replica Set members by updating them to the /etc/hosts file or DNS configuration.

Step 3: Start MongoDB Instances

For each server, a MongoDB configuration file needs to be created which will be saved as mongod.conf file name and written in yaml format type similar to the code snippet below.

storage:

dbPath: /var/lib/mongodb

journal:

enabled: true

systemLog:

destination: file

path: /var/log/mongodb/mongod.log

logAppend: true

net:

bindIp:

port:

replication:

replSetName: myReplSet

To start MongoDB on each server using the configuration file made above i.e. mongod.conf file by using the following bash command:

mongod -f /path/to/mongod.conf

Step 4: Initialize the Replica Set

Now we need to connect to any one of the MongoDB instances (which is basically replica set) created using the bello MongoDB shell command:

mongo --host <hostname>:<port>

Replica Set needs to be initialized now by executing the following command:

rs.initiate({_id: "myReplSet", members: [{_id: 0, host: "<primary_host>:<primary_port>"}]})

Step 5: Add Secondary Members

After executing the Replica Set, connect to the primary node using the following bash command in MongoDB shell:

mongo --host <primary_host>:<primary_port>

Add secondary members using the following Javascript command:

rs.add(":")

Repeat this step for each secondary member.

Step 6: Optional - Add Arbiter Node

If you want to add an arbiter node for elections, connect to the primary node’s MongoDB shell and execute the following javascript command:

rs.addArb("<arbiter_host>:<arbiter_port>")

Step 7: Check Replica Set Status

To check the status of the Replica Set, connect to any of the MongoDB instances and run the following javascript command:

rs.status()

Step 8: Test Connection Failure

To test connection failure, you can simulate a primary node failure by stopping the MongoDB instance. The Replica Set should automatically elect a new primary node. Please note that the provided steps and code snippets are generalized and the actual steps might require adjustments based on the specific environment and use case. This is where a near real-time low code tool like fivetran can be leveraged as you just need to connect MongoDB with it and then Fivetran would handle all the replication tasks without any hassle.

While MongoDB replication using the replica set method offers numerous benefits, there are situations where its complexity, resource requirements, or alignment with specific use cases make it less feasible. Organizations need to carefully assess their requirements, infrastructure, and operational capabilities to determine whether replica sets are the appropriate solution or if alternative strategies should be considered. In such circumstances one can always consider a low code data replication tool like Fivetran.

Method 2 : MongoDB replication using fivetran

Step 1: Find Host Identifiers

To configure Fivetran, you need to identify the MongoDB replica set's host identifier. The host identifier can be in various formats:

SRV host identifier: `mongodb+srv://example.mongodb.net`
Connection string: `mongodb://mongodb0.example.com:27017,mongodb1.example.com:27017,mongodb2.example.com:27017`
Domain and port: `your.server.com:27017`
IP address and port: `1.2.3.4:27017`

You can also optionally use Analytics nodes or specify read preferences based on your needs. You can find your host identifiers either using MongoDB Atlas or the MongoDB shell.

Using MongoDB Atlas:

Log in to the MongoDB Atlas dashboard.
In the Cluster Overview tab, click Connect.
Select Connect your application.
Copy the SRV host identifier.

Using MongoDB shell:

Connect to your replica set or primary node using the MongoDB shell.
Execute the `db.adminCommand({ replSetGetStatus : 1 }).members` command.
Copy the host identifier and alternative host identifiers if needed.

Step 2: Allow Database Access

Create a database user for Fivetran using MongoDB Atlas or the MongoDB shell.

Using MongoDB Atlas:

Log in to the MongoDB Atlas dashboard.
Go to Security > Database Access.
Create a new database user with specific privileges including `readAnyDatabase` and `read` on the `local` database.

Using MongoDB shell:

Connect to your replica set or primary node using the MongoDB shell.
Execute the necessary command to create a user for Fivetran, specifying roles.

Step 3: Choose Connection Method

Decide how you want to connect Fivetran to your MongoDB cluster: directly, using an SSH tunnel, or through a private link.

Connect Directly (TLS required): Configure firewall and access control to allow incoming connections from Fivetran's IPs.

Using MongoDB Atlas:

Make note of MongoDB cluster's cloud service provider and region.
Go to Security > Network Access.
Add Fivetran's IP to the access list.

Using MongoDB shell:

Follow MongoDB's Security Considerations documentation to safelist Fivetran's IPs.

Connect using SSH (TLS optional): Configure firewall to allow connections to your MongoDB port from your SSH tunnel server's IP.

Connect using PrivateLink (Optional): If you have a Business Critical plan, you can use AWS PrivateLink, Azure Private Link, or Google Cloud Private Service Connect to connect Fivetran to your MongoDB Atlas database.

Step 4: Set Oplog Size (Optional)

Adjust oplog size to retain sufficient changes, at least 24 hours' worth, preferably seven days' worth. Your oplog size can be adjusted using either MongoDB Atlas or the MongoDB shell:

MongoDB Atlas: Follow MongoDB Atlas' Set Oplog Size tutorial.
MongoDB shell: Follow MongoDB's Change the Size of the Oplog tutorial.

Step 5: Choose Pack Mode

Choose between packed mode and unpacked mode based on your needs.

Step 6: Finish Fivetran Configuration

Enter Destination schema prefix.
Enter Host and ports.
Provide Fivetran-specific User and Password.
Choose Connection Method.
If you enabled SSL/TLS on your database, configure it accordingly.
Click Save & Test to validate the connection.

You only need to authenticate the MongoDB instance with Fivetran one time which will hardly take a few minutes. Upon successful setup, you can start syncing data using Fivetran. You can now replicate your data anywhere and any number of times whether it be any AWS service, cloud database or data warehouse. For more details you can go through this detailed MongoDB set up guide.

Advantages of using Fivetran for MongoDB replication

The following are some major benefits of utilizing Fivetran for MongoDB replication:

Seamless Data Integration: Fivetran offers pre-built connections for a variety of data sources like MongoDB and for various data warehouses like Amazon Redshift, removing the need for manual scripting or intricate settings. This speeds up and simplifies the process of integrating data.
Automated Workflows: By automating the process of data loading, automated workflows assist in consistently and frequently synchronizing data. By minimizing manual involvement and maintaining data integrity, it manages incremental updates, data format changes, and schema modifications.
Data Transformation Capabilities: Before putting the data into the data warehouse, users can perform customized data transformations thanks to the system's robust data transformation capabilities. This makes it possible to clean, normalize, and enhance data, ensuring that it is prepared for analysis.
Monitoring and Alerting: It provides monitoring and alerting options so that you can keep track of how the data integration process is going. It offers visibility into data loading metrics, error correction, and alerts for any emerging issues.
Flexibility of Data Sources: Fivetran supports a wide range of data sources. It allows businesses to combine data from many sources into data warehouses like BigQuery, Redshift etc. by connecting to different databases, cloud services, and apps.
Saving Time and Resources: By automating the data loading process and eliminating the need for manual intervention, Fivetran saves time and resources. Teams can now focus on data analysis and developing conclusions from the loaded data.

Conclusion

In conclusion, MongoDB replication emerges as a fundamental and powerful feature that fortifies the core of modern database architectures. It ensures high availability, fault tolerance, and data durability by permitting the generation of redundant copies of data across multiple nodes. Applications are more reliable overall when they can manage failover circumstances with ease, ensuring continuous access to crucial data and smooth business operations. Additionally, replication's adaptability to varied application requirements is highlighted by the freedom to design it for different use cases, such as read scalability and geographical distribution. MongoDB replication is evidence of the database's dedication to satisfying these demands as businesses continue to look for reliable and flexible data solutions.

Carrying out MongoDB replication tasks might be tedious as well because of the variation in environments. This is where you can leverage a near real-time low-code tool, Fivetran. There is no need for any coding because it can automate your replication task. You can also visit Connector Directory | Fivetran, to explore various other connectors supported by Fivetran.