Pausing Replication In SymmetricDS

There are several different ways replication can be paused in SymmetricDS to fulfill different use cases. This blog will identify the differences between the various approaches so that you can make the best decision for your use case

Overview

Pausing replication might not be needed on most replication scenarios however if the situation arises where it is necessary to stop the flow of changes here are some options to consider. Each option has some different effects on the system and what happens when replication is started again.

Channel Enabled/Disabled

A very quick and simple way to stop replication at the channel level (or for all channels if needed) is to just toggle the enabled flag. This can be found on the Configure->Channels screen under the “show advanced options”. It can also be scripted using update sym_channel set enabled=0 where channel_id = ?.

When paused

This will take an effect on routing and all disabled channels will be skipped during the routing process. This allows changes to continue to be captured but will remain unrouted until the channel is enabled again. The purge process will also bypass these changes as they will not be eligible to purge so the data is safe and ready to sync when enabled. Be sure not to leave a channel disabled for too long though or it could create a sym_data table excessive in size until these changes are allowed to replicate again (and ultimately be purged).

When started again

Once the channel is enabled the next routing job run will pick it up again. It will route as much data as the channel allows (see max data to route on a channel) and also fill up batches when possible (see max batch size on a channel). The larger batch size will allow it to catch up quickly by creating the larger batches to process when running again.

Node Channel Control Table

To control replication at the node and channel combination the sym_node_channel_ctl table can be used. This table combines a node and a channel to determine how replication will be controlled. There are two additional flags on the table to either suspend or ignore batches that meet this node and channel combination.

When paused

Routing will continue to process changes as normal but when a batch is created if the node and channel combination match any rows in the sym_node_channel_ctl table the flags will be used to determine how the batch is processed. If the ignore flag is on the batch will be marked for purging. If the suspend flag is on the batch will remain in a paused status indefinitely and will not be purged.

When started again

If a row is removed from this table that is preventing any suspended batches or the suspended flag is turned off the next push or pull job that processes the batch will pick it up and process per normal operation again.

Node Group Channel Window Table

To control replication at the node group and channel combination over a specific time period the sym_node_group_channel_wnd can be used. This table is setup for a node group, channel, start time, and end time to control when the replication is active. This might be used if replication is not to occur off hours or during some outage window.

When paused

Routing will continue to process changes as normal but when a batch is created for a node in a node group and a channel that matches the sym_node_group_channel_wnd table it will check the start and end time to determine if it can be processed. All unprocessed batches will remain in a paused state indefinitely and will not be eligible for purging.

When started again

If the entry int this table is removed or the current time extends beyond the start and end time range the batches will process again under normal operation on the next push or pull job execution.

Offline Node

Taking a node offline either intentionally or unintentionally will pause all replication in and out of the node indefinitely. Batches in this case will remain idle and will not process while a node is offline. They will continue to back up and can have side effects on the database storage and disk space if the time for which the node remains offline is long. If a node remains offline too long it might be faster to have it unregister and register again with an initial load. In some excessive cases the initial load size might be less than the size of all the changes that accumulated while offline.

Trigger Router Enabled/Disabled

Replication can also be paused by disabling table routing (sym_trigger_router). By setting the enabled flag to false here it will no longer capture changes for the tables that were disabled. This can be a bit more evasive than the options above as it will no longer detect changes and physically remove triggers while disabled.    These change will be lost and the only way to get back in sync again is through an initial load. So this approach should be used with caution.

Trigger or Router Insert, Update, Delete Toggled

If you need to control the DML type that is replicated or pause all three types (insert, update, and delete) you can do so at either the trigger or router level.

Trigger

At the trigger level it will prevent or remove triggers based on which DML is turned off. For example, if you turn off deletes then the delete trigger will be removed and only inserts and updates will be replicated. If you disable (uncheck) all three all replication for a given table will be disregarded.

Router

At the router level it will put these changes into batches that are marked as unrouted (node id of -1). This will continue to capture changes but they will be immediately setup for purging and not sent.