Best Practices using SymmetricDS

SymmetricDS has been designed around the user upon numerous feature requests from our clients and the open source community. As a result there are a lot of different factors that play a role in how to use it within your project. At a higher level though here are some of the best practices we find should be considered for nearly all implementations.

Installation

SymmetricDS performs best when there is a fast and reliable connection between the machine that SymmetricDS runs on and the database. In use cases that involve replicating a few databases located on the same local network a single installation is usually sufficient (multi-homed). However in a wide area network setup it is recommended to install an instance of SymmetricDS near each database.

Table Creation

While SymmetricDS can create tables for you on a target database it is a good practice to review these tables especially when you have different database platforms on the source and target. The DBExport feature can assist in generating a SQL script of the target tables so that it can be reviewed prior to applying to the target system. This allows a user to manipulate any syntax that might not translate from one platform to another. Sometimes there are references to functions or default values that use features only available on the source system.

Scheduling

Jobs are a critical piece to the replication workflow in SymmetricDS and run a schedule that is configurable by the user. The most commonly adjusted job schedules are for the push, pull, routing, and purging jobs. These all have default values but often time people want to know what are the pros and cons around adjusting these values.

Push/Pull – These job default to 60 seconds and I often see sync scenarios that are running fewer nodes (25 or less) adjust these down to 10 seconds or sometimes faster. The implication here is the excess http traffic to check for changes between nodes which is not noticeable with a fewer set of nodes. However on environments with hundreds or thousands of nodes this should be considered before adjusting these values to low.
Routing – This job defaults to 10 seconds and as of 3.8 has been enhanced to run much more efficiently and faster. I generally do not recommend clients turning this down much lower as it has a bit more overhead while reading the sym_data table which is the change capture table. The considerations on when to adjust this would be based on the volume of changes that the database incurs during this cycle.
Purge – This job defaults to every night at 12am. The purge job is responsible for cleaning up the SymmetricDS runtime tables that have data that has already been replicated. For configurations that take on a great deal of changes every day this might be adjusted to every 12 hours or even every 6 hours. The goal is to run this job though when the system is least busy as it is trying to clean up the tables that are responsible for change capture. Running this at a busy time could cause timeouts or even affect other jobs to run longer like routing. See also the parameter: purge.retention.minutes.

Loading Data

Loading data initially is essential to begin the process of change capture to ensure the source and target are in a known state before changes begin to replicate. Below are a few things to consider in order to speed up an initial load.

Truncate – Truncate tables prior to loading data. This is an option in the Professional load data wizard as an action before loading.
Data Loader Type – On the reload channel (or other channels you may have setup for reloads) you can specify a data loader type of “bulk”. SDS will detect the target platform type and attempt to use a bulk loader to load in the data which is much faster than the default loader. If the bulk loader fails on a batch it will automatically fall back to the default loader and the next batch again will try the bulk loader.
Channels and Queues – If your target tables can be grouped by their dependencies you can assign different tables different channels and channel threads. By default all tables in a reload will go through the reload channel (default thread) which means if there are 10 tables that generate 1000 batches they will run in series on a single thread. However if the 10 tables were not dependent on each other you could create 10 different reload channels with each having a different queue name. This would allow all 10 to extract, transfer, and load in parallel.
Batch Size – The channel (reload by default) has the max batch size. By adjusting this you may find better or worse load times on the target. The batch will be committed to the target database as a whole so a batch size of 50k would result in a 50k row commit. Some databases may load faster with commits at 2k or 10k. This one would take a few iterations to determine in a test environment though so you can extrapolate the numbers to match a production system.

Groups

Groups are the root of all SymmetricDS configuration. All configuration ties up to a group. So setting up your groups up front can take some planning and design but is critical because it could cause rework of your entire configuration if you need to change them. Essentially groups are defined has a piece of shared configuration. You can then have as many nodes (databases) participate in that group as needed and they will all follow the same set of instructions (configuration). So before you start building up a complicated configuration it might be time well spent planning the groups needed to support changes in your configuration. Changes in configuration might be different schemas or table names or the direction the data will be moving. For example you may have designed for one group because the tables are all the same but in reality you may need some tables in database A, B, and C to transform data as well. This would be an example of now using two groups to represent these databases, one with transforms and one without.

Channels

By default all change capture goes through the “default” channel and all loading of data goes through the “reload” channel. This may be sufficient for most simple solutions. However groups can be used to increase performance

Batches in Error

There can be several root causes for a batch in error but know where to look may help in the debugging process. The first step is to determine the source and target nodes involved in the batch that is in error. This allows you to use the Manage->Outgoing and Manage->Incoming batch screens effectively based on the source or target.

For example, assuming we have groups of server and client and the batch that is in error is moving from the server to the client. We could look at the server’s web console and find the batch on the Manage->Outgoing Batches screen (may need to adjust filters). We could also look at the client’s web console and find the batch on the Manage->Incoming Batches screen if the batch was sent to the client.

We may find different information on each side. If the batch had a problem extracting from the source database we would see more detailed information on the outgoing batches screen. Viewing the batch using the magnifying glass on the outgoing batch screen will allow us to look closer at the batch contents and the errors that were provided. There is a view csv button here that allows us to see the actual contents of the batch that would be transmitted over HTTP/S to the target.

If the outgoing side is not showing enough information we might check the incoming batch details on the other side. Again find the batch on the incoming batch screen a look at the details using the magnifying glass icon.

Common Batch Errors

Foreign key constraints – This might be a result of a child table and parent table on different channels which could allow a child row to load before a parent. It might also occur if the parent table is not setup for replication. If this occurs on a reload of data its possible that the source is extremely active and the extraction of the child table had rows removed or added after the load extracted the parent information. In this case the load would need to be cancelled and retried.
Table missing – This might occur if the target table is not present or in a different schema. Creating this table will allow the batch to automatically process and complete successfully. This may also require a transform to ensure the batch is sent to the proper target catalog and schema
Network issue – This could occur if the target node is offline or if the connection is disrupted during the transmission of the data. These errors will usually work themselves out if the network connection is able to be re-establish on the next push/pull.
Disk space – This could occur if there are a large number of batches that are being written to disk and space is full. You could either run the purge job to free up some space for batches that have already processed or increase disk space.

Summary

In summary there are a lot of configurations for the SymmetricDS product to support a variety of use cases. These implementations can also involve a great deal of infrastructure, network resources, and hardware. So these best practices will naturally vary from design to design and are only meant as some starting points and recommendations based on use cases we have experienced in the field.