SymmetricDS has for years been a robust, reliable solution for synchronizing data between multiple databases and multiple database platforms. With SymmetricDS 2.4, SymmetricDS has expanded to also prove data transformation capabilities while synchronizing. The data transformation is performed via configuration settings in new SymmetricDS configuration tables, in a manner consistent with and familiar to existing SymmetricDS users.


This article will provide an overview of the configuration and settings for data transformation and will demo a few simple examples of data transformation in action. There is also additional detail available in the latest SymmetricDS User's guide.

Overview

The data transformation feature was first added to open-source SymmetricDS in release 2.4 as a result of requests from the user's community for this functionality. The SymmetricDS Pro version has also been updated to provide simple GUI configuration of transformations. Data transformation still relies on SymmetricDS configuration to capture a change, of course. If you haven't configured a SymmetricDS trigger / router to capture the change on the source, the transformation will obviously not occur. The source trigger creates the synchronization data, while the transformation configuration decides what to do with the synchronization data as it is being extracted from the source or loaded into the target.


Some examples of transformations that can now be accomplished through configuration include:

  • Copy a column from a source table to two (or more) target table columns,
  • Merge columns from two or more source tables into a single row in a target table,
  • Insert constants in columns in target tables based on source data synchronizations,
  • Insert multiple rows of data into a single target table based on one change in a source table,
  • Apply a Bean Shell script to achieve a custom transform when loading into the target database.

Many more complex transformation scenarios can also be performed as a result of the flexible configuration settings.


Flexible Configuration

The data transformation engine offers a great deal of flexibility in terms of defining when and how a transformation occurs. Configuration starts by defining a source table and target table for the transformation. At this level, you can then set a number of options on how the transformation is to occur. For example, you can:

  • define whether the data transformation logic will run at data extract time or data load time,
  • define the order in which this particular transformation occurs relative to the other defined transformations that might run for a given data change,
  • define the behavior that occurs in the case of a Delete DML type on the source,
  • define whether Insert DML types should attempt to perform an update first

As eluded to in the above list, transformation behavior is based on the original DML type that resulted in the data capture originally. In other words, if a data change was captured that was caused by an Update, SymmetricDS will attempt to perform the transformation by creating one (or more) update statements at the target. So, generally speaking, updates on the source resulted in updates on the target; inserts result in inserts, and deletes result in, well, it depends. In the case of Delete, you have the flexibility to (a) do nothing, (b) delete a corresponding target row, or (c) set one or more columns in the target to a predefined value. In the last case, consider the following example: you might want a delete of an employee row in a source database to cause an "active/inactive" flag to be set to inactive in the target instead of deleting the row in the target.


For each source table / target table pair, you can then define transformation behavior at the column level, specifying both the source column name and target column name. You can even vary the column-level transform behavior based on the DML type itself, allowing you to specify different sets of columns in the case of an insert versus an update, for example. You can also specify the relative order that column-level transforms occur, along with a "transform type" specifying how the data is actually altered, if needed (a copy of the column data is the default operation).


One additional note before we do some examples. Keep in mind that all transformations occur "in line"; that is, as SymmetricDS is processing the change data that was captured. Each of your transformations are performed as independent operations in sequence and must be "complete" from a SQL perspective. In other words, you must define columns for the transformation that are sufficient to fill in any primary key or other required data in the target table if the source operation was an Insert, for example.

Transformation Examples

With that, let's code a simple real-world example. Assume for the moment you run a succesful dentistry practice that has expanded to have more than one location. As such, you use SymmetricDS to synchronize a "corporate" central office database with a database at each of your dental clinics. In this way, you can keep patient data, schedules, etc, synchronized between the central office and each dental location. Unfortunately, the software you use in the central office doesn't have quite the same database schema as that present in your dentistry office locations.


In particular, let's say that, in the central office, all patient information is kept in a single, denormalized table called "customer". Each patient has a row, with a unique id, first and last names, email address, etc. However, in the dental office locations, patient data is kept in two tables: A master table (called 'patient') contains the id, first name, and last name, but email addresses are kept in a separate 'patient_contact' table, containing an id, contact type (a fixed string of 'email', or 'phone', etc.) and a value to hold the information. Create statements for these sample tables are as follows (my source and target databases for this example are H2):

 
 create table customer (
  id integer not null,
  first_name varchar2(100),
  last_name varchar2(100),
  email varchar2(100),
   primary key (id));
 
 create table patient (
  id integer not null,
  first_name varchar2(100),
  last_name varchar2(100),
   primary key (id));
 
 create table patient_contact (
  id integer not null,
  contact_type  varchar2(10) not null,
  contact_value varchar2(100),
   primary key (id, contact_type));
 

Inserts into the central office should create two insert statements in the dental office database, and updates would update the appropriate columns and rows. For deletion of patients in the central office, the delete of the patient row in the central office should result in a delete of the corresponding rows in both tables at the dental office.


As far as SymmetricDS basic setup is concerned, this is a pretty typical two-tier node network. We'll use a node group of "server" for the corporate central office server, and "client" for the dental office location databases, with the clients pushing data to the server and the clients pulling data from the server. We'll also configure two routers, one for each direction (although our example will focus on server to client (central office to dental office locations) synchronizations only).


Before we configuration the data transformation, we need a trigger on the customer data that sends the data to the client. I've used SymmetricDS Pro's ability to auto-create triggers based on database metadata and configured a trigger for our customer data for sending data to the client. If you are using the open source version, this corresponds to creating a row in sym_trigger, sym_router, and sym_trigger_router.



So, how would we configure the transformations to achieve these goals? For the data transformation, we need three columns from customer to map to patient and want a delete action to delete the corresponding row. Set-up of this transformation using SymmetricDS Pro is quite easy. We defined a single transform id, customer2patient, mapping us from customer to patient tables, and fill in three source/target columns, being careful to define id as the primary key. We use the default transformation type of "copy", since we wish to copy the values to the target unchanged. If you are using the open-source version of SymmetricDS, this is equivalent to creating one entry in sym_transform_table and three entries in sym_transform_column tables. Note that if your column data applies to all DML types (as in this case), you can use an "*" in the include_on column to represent all types.



For the customer to patient_contact mapping, things are almost as straightforward. We define a transform mapping between the two tables, called customer2patientcontact. We map two columns (id to id and email to contact_value) but we also need to map a third target column, contact_type, which we need to be a constant value "email". So, we use a transform type of "const" (constant) and supply the constant value in the transform expression. We mark both id and email as part of the primary key so that the deletes and updates will operate only on the appropriate rows.


Adding new dynamic data

Let's extend our example slightly. Let's say, for example, that the 'patient' data table has a last_updated column that we wish to fill in whenever a transformation occurs. This column doesn't exist in the central office customer table, so we need to add the data dynamically as part of the transformation. We'll add the new column using the following SQL:

 alter table patient add column LAST_UPDATED datetime;
 

Next, we simply add an additional column to the customer2patient transform to supply new data for the target, similar to our use of the "constant" transform type for email. This time, we'll take advantage of the "variable" transform type, which provides a few dynamically computed variables that can be placed into a target column. In this case, we'll specify "system_timestamp" to fill in the current data and time at the moment the transformation occurs.



Summary

The few quick examples above are really just a basic sample of the kinds of transformations you can configure. More complex examples, where the columns vary based on DML type, for example, are possible and easy to configure. For more details and a complete list of configuration options, please consult the SymmetricDS User's Guide, available at www.symmetricds.org.

Mark Hanes
Author: Mark Hanes

Mark has software development experience in a variety of fields, including retail, telecommunications, and research institutions. A graduate of The Ohio State University with a Masters Degree in Electrical Engineering, Mark now focuses on software development and development management for JumpMind. Prior to joining JumpMind, Mark performed software architecture and development work for many years for a variety of companies both large and small. He has also taught Computer Science and Electrical Engineering courses both full- and part-time at the Bachelor's level for a number of years.