MongoDB Replication With SymmetricDS

MongoDB has emerged as one of the leading data stores in the cloud today. SymmetricDS now allows bi-directional data replication with MongoDB in the cloud across a variety of other common database platforms.

Overview

With the increased popularity of moving data to the cloud to perform an array of analytics on this data, MongoDB has emerged as one of the market leaders. There are a variety of tools and mechanisms to load data into Mongo on the market today however, moving an active database that incurs real-time data changes constantly is another task. This is where replication through SymmetricDS can provide a solution. SymmetricDS has been a long-time replication tool for near real-time replication on a variety of other platforms. With the 3.13 release, MongoDB is now fully supported for bi-directional replication.

 

Load Only 

If you only need to write data into MongoDB, a load-only setup would be sufficient. A load-only node in SymmetricDS utilizes an underlying H2 database created and maintained by SymmetricDS for internal processing and storage. This would provide full support to write changes into MongoDB from one too many other sources. The bulk loader is also available in this configuration.

  

Log Based

Using the change streams API provided by MongoDB, SymmetricDS can not monitor the Mongo platform for changes to flow through the normal SymmetricDS replication workflow. MongoDB provides a resume token as well that SymmetricDS will maintain as changes are read and processed. In the event that there is an interruption in the reading of changes this resume token can be used to pick up where it left off so that no data is lost.

The _id field 

All documents stored in MongoDB utilize an _id attribute to represent uniquely this piece of data.   The change stream API on deletes and updates does not contain the old data (as it existed before the event).  As a result on a delete only the _id is provided through the change stream API. Similarly on updates the old values are not available only the current values associated.

Delete issues with _id field 

Because only the _id value is provided through the change stream API, delete events will only replicate to targets if they also have this _id field and it represents the primary key.    To turn this on you must set the following parameter in SymmetricDS and ensure your target tables also have this _id field and that it is a primary key.

mongodb.use.mongo.ids=true

Updates

If the _id field is being used on the source and target updates will work without any additional configuration changes. However, if the _id field is not being used there are two possible SymmetricDS configurations that can be used to handle updates in MongoDB that will be replicated to a target database.

1. Target table has primary keys : Set the following parameter.

dataloader.use.primary.keys.from.source=false

2. Target table does not have primary keys : Use the sync key names on the table trigger configuration to specify which columns to use as the PK while loading data to the target.

 

Bulk Loading

While setting up a MongoDB node in SymmetricDS the advanced setup screen provides an option to turn on bulk loading. Bulk loading will use the bulkWrite() of the Collections object in the API. This will only occur for loads though not change data capture (change stream).