Clienteling isn’t a new theory. It can be traced back hundreds of years when shopkeepers …
While setting up replication between a variety of databases there are some blocking scenarios that should be considered upfront in order to create a successful replication scenario. If these considerations are not taken into account the replication is likely to fail or result in the databases involved being out of sync.
If the target database contains constraints that are not present on the source the replication could fail to load with the wrong dataset. The obvious restrictions might be foreign keys that are present on the target but not the source. However, there are several others that should be analyzed
Below are some examples that if present on the target but not on the source would result in replication issues that can not be resolved without a manipulation of data or the table structures.
Add matching constraint to source
Remove constraint from target
Use transformations to convert the data
Dismiss or ignore the change
Most replication tools will only replicate tables that are configured by the user for replication and no more. So if not prevented during setup a child table might be setup for replication without its dependent parent table resulting in a replication issue
Add all dependent tables to the replication configuration
Remove tables from replication that do not also have dependent tables setup for replication
Remove target constraints
If the target database table(s) do not have primary keys they are susceptible of data duplication. While this may not be considered a blocker it can have some undesired results during replication. Even if the source tables also does not have primary keys it is possible through data load processes to duplicate data if the same load is requested multiple times over a period of time.
Add primary key constraints to the target database
Determine if your replication tool provides a “virtual pk” configuration (SymmetricDS does)
Most databases provide a compatible set of date ranges that are acceptable in their systems. Typically ranges are set to allow dates between January 1, 1000, and December 31, 9999. However MySQL for example can also be configured to allow a zero based date in place of using nulls (0000-00-00 00:00:00). This would work when replicating between only MySQL databases but can cause issues if replicating to a database that does not accept this value.
SQL Server also has data types that do not fall in the standard ranges like smalldatetime and datetime which have a smaller range. Again would not be an issue amongst other SQL Server databases replicating but could cause issues in a cross platform.
Another complication around replicating date and time values arises around “fractional” or “micro” seconds. For example Oracle has the ability to record up to 9 values in the “fractional” seconds space but MySQL only permits 6 values.
In the example below the following two values in Oracle would be unique however when replicated to MySQL they would have the same value. This might not be an issue in most cases but if the column is part of the primary key it would result in only one row in MySQL while two rows in Oracle.
|2022-06-01 12:00:00.000000111||2022-06-01 12:00:00.000000|
|2022-06-01 12:00:00.000000222||2022-06-01 12:00:00.000000|
Transform or filter dates out of range to a new value so they can be loaded
Clean all data in the source that would be out of range or invalid in the target
Usually when setting up a replication project the target system is new or you have control over it to make any adjustments regarding the data structures. However there are cases where the databases being replicated may already be active and do not fully line up with the same data types and/or sizes. This can result in data loss or truncation if the replication is not configured properly or understood.
An example might be as simple as text based column where the source has a limit of 10 characters but the target a limit of 8. What does replication do in this situation? Should it automatically truncate and move on or should it throw an error and require manual data correction?
|Source – Column Length 12||Target – Column Length 8|
Adjust columns at either source or target to match on each side
Transform data to automatically truncate
Clean data on the source and prevent new data from expanding beyond the target limits
Some replication use cases only require replication in one direction. This is common for warehousing, analytics, or reporting setups. However if there is a time that the target ever becomes active with changes again then the database will become out of sync. Even worse the changes at the target may fail replication coming in from the source.
For example if a child and its parent data are removed from the target but replication is only in one direction these changes are not replicated to the source. Then if the source updates that same child or inserts another child under the parent the replication will fail as the dependent parent data is no longer at the source.
Setup bi-directional replication if the target becomes active with changes
Utilize a replication tool that can auto resolve dependent data from the source if missing at the target (SymmetricDS does)
Often tables are designed to utilize auto increment columns as the primary key. This works well to uniquely create an identifier for each new row inserted into a database. However how these identifiers are replicated is sometimes overlooked and can cause issues if not done properly.
It is important to choose a replication tool that can perform auto increments at the target while using the same identifier on the source and not creating a new one. Creating a new identifier not only gets the systems out of sync but also prevents proper updates and deletes from being replicated as the source identifier is no longer the same at the target.
Sometimes the target will include a second auto increment column and preserve the original identifier in another column that is not the primary key on the target. This solution can work well for replicating inserts but usually requires additional configuration to replicate a delete or update on the source to ensure it uses the same identifier column that was used during insertion.
Choose a replication tool that can load into an auto increment column directly with a value rather than creating a new one
When using an additional column on the target that also uses auto increment be sure replication is configured to use the proper columns