Read Only Replication (Time-Based Replication)

Do you need to move data from a read only source as it changes to a new target?   Read only or time-based replication can be used to achieve this use case.   There are positives and negatives but sometimes this is your only choice.   This blog will review these advantages and how it can be achieved.

TODO image

When is read only / time-based replication needed?

  • Ownership is by a third party and the ability to obtain more access to capture changes through triggers or logs are not possible.
  • Permissions  are reduced to select/read only access on the source data.
  • Table structures already designed with a last updated column or only inserts occur on a table.

How are changes captured?

In a read only / time-based capture scenario there are three options to stream data from the source to the target.

  1.  Timestamp Column
  2. Numeric Primary Key
  3. Data Refresh

1. Timestamp column

If your table(s) have a timestamp column that represents a last updated value for the row this is the most effective approach.   Each table can be configured with a specific timestamp column to be used and does not have to be named the same across all tables.

2. Numeric Primary Key

If your table(s) do not have an effective timestamp column that can be used for all changes to the row and the table only has data inserted into it, a numeric primary key may be used.   If this numeric key is incremental and updates do not occur on the table or are not needed to be captured the numeric PK approach would be sufficient.

2. Data Refresh

As a final approach for any tables that do not have a timestamp or numeric PK to drive changes it can be setup for a full data refresh periodically.   This approach may not be ideal for large tables but might be the only option for a read only source without any other hooks to obtain changes.

TODO : GRID for pro/con all 3

Scheduling?

Both the timestamp and numeric PK tables that have been configured can run on the same log miner scheduling.   By default this will run every 10s and be configured for any schedule that meets your needs.   This will keep internal pointers for all of these tables either based on date or numeric PK values on what to retrieve with each run.

The data refresh tables have a dedicated job (called Data Refresh Job) that can be scheduled independently of the timestamp and numeric.    These tables are generally not fully refreshed as often since they send the entire data set for the table rather than incremental changes.

NOTE: The data refresh job will use the new compare and repair component if licensed as a more effective approach to send only changes rather than the full data set.  If this is not available a full table reload will be sent.

 

Setup Time-Based Replication With SymmetricDS