data transformation

  • Bridging Databases with Different Dependencies

    Bridge data between databases with data synchronization by using channels to re-order data when the dependencies in each database are different. Each application can model its data differently, so a parent row in one database might map to a child row in another one. Instead of using data transformation to lookup and add dependent data, the order of captured data can be changed to match the constraints of the target database.

    Let's look at two applications that persist customer and address information in their databases. A website database uses the model of "a Customer has an Address", while a call center application uses "a Premise has a Contact". We'd like to synchronize the data between the two databases with Customer to Contact and Address to Premise, but the dependencies don't match on each side. As shown below, you can see the dependency of data is reversed between the two applications. On the website, an Address is created first, followed by a Customer. On the call center, the reverse order must be performed: a Contact is created first, followed by a Premise.

    Bridge data between two different data models

    Mapping the tables for data synchronization is difficult when the system loads data the same way it was captured and the database enforces constraints. If Customer is mapped to Contact, and Address is mapped to Premise, the problem is that the data will arrive out of order for the target database. The website will save an Address and a Customer which are sent to call center. When the Address row is mapped and loaded into Premise on the call center, the database will throw a foreign key constraint because the dependent data for Contact hasn't been loaded yet.

    ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`callcenter`.`premise`, CONSTRAINT `premise_ibfk_1` FOREIGN KEY (`contact_id`) REFERENCES `contact` (`contact_id`))

    Using Channels to Satisfy Dependencies

    We can solve this problem by assigning the tables to channels with different priorities. Channels allow data that would normally be in the same batch together to be split into separate batches and loaded independently. Data is loaded from channels in the order specified by priority, so high priority channels load first. For example, we could create a channel named "parent" with a priority of 10 and a channel named "child" with a priority of 50.

    How using channels changes batching

    If we assign the Address table to the "child" channel and the Customer table to the "parent" channel, the rows are placed in separate batches, with the Customer data loading first. With the data in the correct order for the target database, now we can map columns and transform data to load Contact and Premise.

    Conclusion

    SymmetricDS can integrate data between two applications through data synchronization and transformation. We looked at how dependencies like foreign key constraints can make it challenging to map tables and columns between different databases. Channels provide a way to separate transactional data into batches that can be ordered and loaded independently, making the next step of transformation easier.

  • Choosing The Right Replication Approach

    There are three primary methodologies used in database replication: log based, trigger capture, and time based. Determining which of these approaches is best for your use case can be difficult without understanding the limitations of each type. This article will help identify the differences, pros, and cons of each approach.

    Log Based

    Log based replication is probably the most common and popular approach that many tools support in order to capture changes on a source database. This approach is often viewed as the least invasive approach since the database remains unchanged and only the log files provided by the database are used to determine changes. In general, this is true, however, there is some potential overhead if the database logging must be turned on or set up. Turning on or setting up logging on an active production environment might lead to unexpected changes in performance. If your production database is already configured to enable transaction logging then there would be no additional performance impact on the applications using the database already.

    Some other factors should be considered while setting up a log based replication model. The first is the turnover of the log files. Determining how frequently the logs fill up and are removed so that new logs can be created is important to understand. If your log replication tool is not running or is not able to keep up with the load at peak times before logs are rolled over, changes could be lost during these intervals.

    Another thing to consider in a log based approach is the number of tables that would contribute to the logging versus how many tables are being configured for replication. For example, if there are 500 tables in the database that are potentially taking on changes but only 10 of these tables need to be replicated, there might be a great deal of overhead in processing the logs for a small set of tables compared to the total set of tables contributing to the logging.   

    Pros 

    • Potentially low or no overhead on the existing database.
    • No need for alterations to an existing database structure.

    Cons

    • If logging is not turned on already there may be a performance impact once activated
    • If logs are filled and removed to clear space for new logs to be written there is the potential for loss of data
    • If the amount of tables involved in replication is significantly less than the total table set the replication tool performance in the replication tool might be affected.

     

    Trigger Based

    Trigger based replication is built on the concept of utilizing database triggers to capture each change as part of the transaction that the change occurs on. Often this is viewed as more invasive than log based because the source tables need to be altered to include the triggers. It is important to understand in this approach a replication tool will be responsible for maintaining these triggers in the event they are not present or the source table is changed.   

    One of the key values in this approach is that the capturing of a change is directly tied to the user acting on the system. This is because the same transaction that is being used to commit a user change is also ensuring the trigger fires successfully. This ensures all changes are captured or they would roll back with the user change if there was an issue.   

    Another factor to consider with trigger based solutions is around the data storage of all these changes. Most trigger based approaches will write directly to a set of tables within the database that are created and maintained by the replication tool. As long as the database is sized to scale, these changes will remain present in the database itself and will not have the threat of data loss like the log based approach might have in the event of rolling over the log files.

    Pros

    • Transactional guarantee that data is committed to replication table once committed to application tables.
    • Less threat of data loss as long as the database is scaled to grow as more data is collected 

    Cons

    • Involves altering existing application tables to support replication
    • Performance could vary based on each replication tool's implementation of the change capture triggers.

     

    Time based

    Time based replication consists of selecting changes out of the source directly through queries. Although this requires the least permissions (select access) on the source database, there are some assumptions that need to be qualified in order to work properly. First, there needs to be a column that represents the last time a row was changed (usually a "last updated" column). If there is not a way to construct a query against each table to determine what has changed, this approach will not work properly.   

    If all the necessary tables for replication provide data within them regarding when they were last modified the next thing to identify is whether or not deletes need to be captured. In most straightforward time based replication, deletes are not supported. The source database would need to perform what is known as "soft deletes" where a deleted column is toggled on or off if the record is to be considered in use. Then the process of updating this soft delete column from a false to true and still utilizing the last updated column the change could be processed through time based replication. Otherwise, a true delete on the source will not be present when selecting changes as the record is no longer present to be retrieved.

     

    Pros

    • Only select access is required on the source database.
    • No additional tables or alterations to existing tables are necessary.

    Cons

    • Deletes are not supported unless the time based process reads in all tables on source and target and performs a full check. This is probably not an option on an active system or if you need constant replication that is not done just nightly.
    • All tables participating in the time based replication must have a column that can be used to determine the last change. 

  • Data Normalization using Transformations

    When integrating data from one database to another, sometimes the schemas are not the same. JumpMind has come across a number of projects where there was a need to transform a very denormalized database into more of a canonical or normal form.

    Sometimes normalization is needed as part of a migration path to a newer and better data model.

    Sometimes normalization is required just to feed another system.

    Whatever the reason, SymmetricDS can:

    • Capture the data change at the source database
    • Transform it
    • Transport it (across the across the web or just across your data center)
    • Transform it more (if needed)
    • Load it into your target database

    We will take a look at a simple example to demonstrate how this can be accomplished. We will transform an AppUser table that contains name, role and password information into three tables at the target: user, user_role, and user_password.

    For the purposes of this article, we will use embedded H2 databases that will be created automatically for you.

    In SymmetricDS, each database that is participating in synchronization is represented by a SymmetricDS node. Nodes belong to groups that are linked. If you are not already familiar with SymmetricDS it might make sense to read an overview of SymmetricDS.

    Do the following to get SymmetricDS installed and ready to go:

    Each properties file represent a node. In this case both nodes are hosted in the same SymmetricDS installation. This doesn't have to be the case. For example, in a cloud deployment you might have a SymmetricDS installation in your datacenter that communicates over HTTP with a SymmetricDS installation hosted in the cloud.

    Open the web console at http://localhost:31415/app. You are viewing the source node. You can switch the context to the target node by selecting it in the dropdown at the upper right corner.

    The tables structure we discussed has been created for you automatically. If you select the Explore tab, you will be able to view the tables we will be working with in the database explorer. The source tables have been created in the TEST schema. The target tables have been created in the USERS schema.

    Now go the Configure > Transforms screen. Select “Auto Create”. The “Implied” option means that all columns from the original table will be passed through. We want to explicitly specify the column mapping ourselves, so select “Specified”.

    Now Edit the transform that was created. Set the “Target Schema” to “Users” because this is where our target tables reside. Set the “Target Table” to “User”. Save the transform.

    Now go ahead and “Auto Create” the mapping from “AppUser” to “User_Password” and “User_Role”. Note, in order for the tables to show up in the “Auto Create” dialog you will need to select “Show tables with transforms”.

    Next we need to setup the column mappings.

    Select the first transform and press “Edit Columns”. You will need to add three columns. The user_id to id transform can be a copy transform. Be sure to mark it as the PK.

    The next two transforms will be bsh transforms. We will map the source name column to a first_name and last_name column in the target table. We will look for the first space in the name column. Everything before the space is the first_name. Everything after the space is the last_name.

    This is the bsh script for first_name:

    if (NAME.contains(" ")) { return NAME.substring(0, NAME.indexOf(" ")); } else { return NAME; }

    This is the bsh script for last_name:

    if (NAME.contains(" ")) { return NAME.substring(NAME.indexOf(" ")); } else { return NAME; }

    Now configure the column mappings for “User_Password” and “User_Role”.

    Make sure you have saved and closed all of your edit tabs. It's not time to test the transform. Insert the following SQL statements at the source. Inspect the target tables and experience the normalization of your data!

    insert into test.AppUser values(1, 'Billy Smith', 'user', '$%^#$@#$%^', null); insert into test.AppUser values(2, 'Willy Smith', 'user', '$%^#$@#$%^', null); insert into test.AppUser values(3, 'Wilma Smith', 'user', '$%^#$@#$%^', null);

     

  • Easier Data Transformation with SymmetricDS 2.0

    SymmetricDS has always provided hooks for data transformation, but in 2.x we have made it far easier to add custom data transformations with just a bit of XML configuration and simple transformation scripts.

  • Setting Up Time Based Replication With SymmetricDS

    Time Based Replication in SymmetricDS

    If the source data to be replicated resides in a read-only (select only) database, common replication approaches like log based or trigger based are not an option. Through some additional configuration and a few prerequisites on the source database SymmetricDS can still be used for replication in a time based approach.

     

    What is Time Based Replication?

    Time based replication is when data is selected out of the source directly and used to replicate to the target(s). Unlike log based or trigger based where there is a precise capture of each change. Time based will require some assumptions or additional overhead to determine the changes. If for example, the source table(s) have a column that indicates the last update times this can be used to obtain changes (although not deletes, discussed below). Another approach might be to select all data out of the source and compare it to the target. Although this seems more precise it might not be feasible on larger data sets or databases that incur a lot of consistent changes.   

    If you are limited though to read-only access to the source data then the time based approach might be your best and only option to replicate changes to a target database.

     

    Limitations

    First, we need to identify the limitations of time based replication to determine if this approach will work with your source database.

     

    Does it contain a last updated column?  

    Do the tables to be replicated contain a column that has an accurate date-time of the last time a row was modified? If not it might not be possible to determine the changes on this source database without a full data comparison against source and target. For this article, we will only focus on the cases where such a last updated column exists on the source tables.

     

    Do deletes matter?

    Deletes will not be captured in this approach unless it is a soft delete where a column (maybe named “deleted”) indicates the row is no longer active. In this case, it is just an update to a column to represent a delete and will replicate. If you need to fully replicate deletes again this would be outside the scope of this article and a full comparison would need to be done against the source and target.

     

    SymmetricDS Configuration

    Prerequisites

    • Setup the primary/registration node as the target node
    • Setup the select only source node as an “extract only node”

    Setting up tables

    *There will be two table triggers configured in SymmetricDS for each table. One set is to be used for a full initial load to send all data in the source tables. The second will use an initial load select to only select data based on a Time based interval.

    • Create a new channel named “time-based” with default settings.
    • Create a new channel named “time-based-reload” with default settings AND check the reload channel box.

    • Use the “Auto Create” wizard on the configure tab and the table triggers screen. -
    • Select tables for Time based replication
    • Select the “time-based” channel
    • Provide a suffix “-time-based”

    • On the configure tab and the table triggers screen type in the filter “-time-based” to filter only the triggers just created.
    • Use the select all to perform a bulk edit.

    • While bulk editing, turn off the insert, update, and delete on these triggers.
    • While bulk editing, set the reload channel for all these tables to “reload-time based”

     

    • On the configure tab and the table routing screen type in the filter “-time-based”.
    • Use the select all to perform a bulk edit.

    • While bulk editing, provide an initial load select based on last update column and frequency.

    Table 1 - Below is a list of examples how to go back 1 minute using a where clause on several common dialects

    Oracle last_updated_time >= (sysdate-(1/1440))
    MSSQL last_updated_time > DATEADD(MINUTE, -1, GETDATE())
    MySQL last_updated_time >= DATE_SUB(NOW(),INTERVAL 1 MINUTE); 
    Postgres last_updated_time >= (NOW() - INTERVAL '1 minute' )
    H2 last_updated_time > DATEADD('MINUTE',-1, CURRENT_DATE)

     

      

    Custom Job

    Create a custom job that is configured and scheduled to pull over updates from the source periodically.  

    • Choose the source nodes group
    • Choose BSH as the job type
    • Schedule the job to run based on the interval you chose above while setting up the table routing (every 1 minute for example).
    • On the edit script section of the job provide the following. Replace the source and target node values accordingly.import org.jumpmind.symmetric.model.TableReloadRequest;

    TableReloadRequest reloadRequest = new TableReloadRequest();
    reloadRequest.setChannelId("time-based-reload");
    reloadRequest.setSourceNodeId("YOUR SOURCE NODE ID");
    reloadRequest.setTargetNodeId("YOUR TARGET NODE ID");
    reloadRequest.setRouterId("ALL");
    reloadRequest.setTriggerId("ALL");
    reloadRequest.setCreateTime(new Date());

    engine.getDataService().insertTableReloadRequest(reloadRequest);



  • SymmetricDS 2.4 Data Transformation Overview and Examples

    SymmetricDS has for years been a robust, reliable solution for synchronizing data between multiple databases and multiple database platforms. With SymmetricDS 2.4, SymmetricDS has expanded to also prove data transformation capabilities while synchronizing. The data transformation is performed via configuration settings in new SymmetricDS configuration tables, in a manner consistent with and familiar to existing SymmetricDS users.


    This article will provide an overview of the configuration and settings for data transformation and will demo a few simple examples of data transformation in action. There is also additional detail available in the latest SymmetricDS User's guide.

    Overview

    The data transformation feature was first added to open-source SymmetricDS in release 2.4 as a result of requests from the user's community for this functionality. The SymmetricDS Pro version has also been updated to provide simple GUI configuration of transformations. Data transformation still relies on SymmetricDS configuration to capture a change, of course. If you haven't configured a SymmetricDS trigger / router to capture the change on the source, the transformation will obviously not occur. The source trigger creates the synchronization data, while the transformation configuration decides what to do with the synchronization data as it is being extracted from the source or loaded into the target.


    Some examples of transformations that can now be accomplished through configuration include:

    • Copy a column from a source table to two (or more) target table columns,
    • Merge columns from two or more source tables into a single row in a target table,
    • Insert constants in columns in target tables based on source data synchronizations,
    • Insert multiple rows of data into a single target table based on one change in a source table,
    • Apply a Bean Shell script to achieve a custom transform when loading into the target database.

    Many more complex transformation scenarios can also be performed as a result of the flexible configuration settings.


    Flexible Configuration

    The data transformation engine offers a great deal of flexibility in terms of defining when and how a transformation occurs. Configuration starts by defining a source table and target table for the transformation. At this level, you can then set a number of options on how the transformation is to occur. For example, you can:

    • define whether the data transformation logic will run at data extract time or data load time,
    • define the order in which this particular transformation occurs relative to the other defined transformations that might run for a given data change,
    • define the behavior that occurs in the case of a Delete DML type on the source,
    • define whether Insert DML types should attempt to perform an update first

    As eluded to in the above list, transformation behavior is based on the original DML type that resulted in the data capture originally. In other words, if a data change was captured that was caused by an Update, SymmetricDS will attempt to perform the transformation by creating one (or more) update statements at the target. So, generally speaking, updates on the source resulted in updates on the target; inserts result in inserts, and deletes result in, well, it depends. In the case of Delete, you have the flexibility to (a) do nothing, (b) delete a corresponding target row, or (c) set one or more columns in the target to a predefined value. In the last case, consider the following example: you might want a delete of an employee row in a source database to cause an "active/inactive" flag to be set to inactive in the target instead of deleting the row in the target.


    For each source table / target table pair, you can then define transformation behavior at the column level, specifying both the source column name and target column name. You can even vary the column-level transform behavior based on the DML type itself, allowing you to specify different sets of columns in the case of an insert versus an update, for example. You can also specify the relative order that column-level transforms occur, along with a "transform type" specifying how the data is actually altered, if needed (a copy of the column data is the default operation).


    One additional note before we do some examples. Keep in mind that all transformations occur "in line"; that is, as SymmetricDS is processing the change data that was captured. Each of your transformations are performed as independent operations in sequence and must be "complete" from a SQL perspective. In other words, you must define columns for the transformation that are sufficient to fill in any primary key or other required data in the target table if the source operation was an Insert, for example.

    Transformation Examples

    With that, let's code a simple real-world example. Assume for the moment you run a succesful dentistry practice that has expanded to have more than one location. As such, you use SymmetricDS to synchronize a "corporate" central office database with a database at each of your dental clinics. In this way, you can keep patient data, schedules, etc, synchronized between the central office and each dental location. Unfortunately, the software you use in the central office doesn't have quite the same database schema as that present in your dentistry office locations.


    In particular, let's say that, in the central office, all patient information is kept in a single, denormalized table called "customer". Each patient has a row, with a unique id, first and last names, email address, etc. However, in the dental office locations, patient data is kept in two tables: A master table (called 'patient') contains the id, first name, and last name, but email addresses are kept in a separate 'patient_contact' table, containing an id, contact type (a fixed string of 'email', or 'phone', etc.) and a value to hold the information. Create statements for these sample tables are as follows (my source and target databases for this example are H2):

     
     create table customer (
      id integer not null,
      first_name varchar2(100),
      last_name varchar2(100),
      email varchar2(100),
       primary key (id));
     
     create table patient (
      id integer not null,
      first_name varchar2(100),
      last_name varchar2(100),
       primary key (id));
     
     create table patient_contact (
      id integer not null,
      contact_type  varchar2(10) not null,
      contact_value varchar2(100),
       primary key (id, contact_type));
    

    Inserts into the central office should create two insert statements in the dental office database, and updates would update the appropriate columns and rows. For deletion of patients in the central office, the delete of the patient row in the central office should result in a delete of the corresponding rows in both tables at the dental office.


    As far as SymmetricDS basic setup is concerned, this is a pretty typical two-tier node network. We'll use a node group of "server" for the corporate central office server, and "client" for the dental office location databases, with the clients pushing data to the server and the clients pulling data from the server. We'll also configure two routers, one for each direction (although our example will focus on server to client (central office to dental office locations) synchronizations only).


    Before we configuration the data transformation, we need a trigger on the customer data that sends the data to the client. I've used SymmetricDS Pro's ability to auto-create triggers based on database metadata and configured a trigger for our customer data for sending data to the client. If you are using the open source version, this corresponds to creating a row in sym_trigger, sym_router, and sym_trigger_router.



    So, how would we configure the transformations to achieve these goals? For the data transformation, we need three columns from customer to map to patient and want a delete action to delete the corresponding row. Set-up of this transformation using SymmetricDS Pro is quite easy. We defined a single transform id, customer2patient, mapping us from customer to patient tables, and fill in three source/target columns, being careful to define id as the primary key. We use the default transformation type of "copy", since we wish to copy the values to the target unchanged. If you are using the open-source version of SymmetricDS, this is equivalent to creating one entry in sym_transform_table and three entries in sym_transform_column tables. Note that if your column data applies to all DML types (as in this case), you can use an "*" in the include_on column to represent all types.



    For the customer to patient_contact mapping, things are almost as straightforward. We define a transform mapping between the two tables, called customer2patientcontact. We map two columns (id to id and email to contact_value) but we also need to map a third target column, contact_type, which we need to be a constant value "email". So, we use a transform type of "const" (constant) and supply the constant value in the transform expression. We mark both id and email as part of the primary key so that the deletes and updates will operate only on the appropriate rows.


    Adding new dynamic data

    Let's extend our example slightly. Let's say, for example, that the 'patient' data table has a last_updated column that we wish to fill in whenever a transformation occurs. This column doesn't exist in the central office customer table, so we need to add the data dynamically as part of the transformation. We'll add the new column using the following SQL:

     alter table patient add column LAST_UPDATED datetime;
    

    Next, we simply add an additional column to the customer2patient transform to supply new data for the target, similar to our use of the "constant" transform type for email. This time, we'll take advantage of the "variable" transform type, which provides a few dynamically computed variables that can be placed into a target column. In this case, we'll specify "system_timestamp" to fill in the current data and time at the moment the transformation occurs.



    Summary

    The few quick examples above are really just a basic sample of the kinds of transformations you can configure. More complex examples, where the columns vary based on DML type, for example, are possible and easy to configure. For more details and a complete list of configuration options, please consult the SymmetricDS User's Guide, available at www.symmetricds.org.

  • SymmetricDS 2.4.0 Released

    The open source SymmetricDS 2.4.0 database replication engine is released for general use by the community. While this is a minor revision, it delivers some cool new features!

  • SymmetricDS Transformation Pipeline

    Often when setting up data replication it is necessary to redirect or transform the source tables and columns into a different set of target tables and columns.   SymmetricDS offers this transformation feature as part of the real time data replication workflow.   Below we will review the places within the workflow this transformation can take place and how to configure it.

    Overview

    Many ETL based tools on the market are great a providing a solution around data manipulation and transformation as part of a scheduled process.   However if you need to perform near real time transformations on changes as they occur SymmetricDS can be a powerful asset to continuously perform these transformations as part of the replication process without any additional nightly schedule of batch jobs.

    Pipeline

    First we need to understand the components that make up the replication pipeline and when they are executed.   The image below diagrams all the components that can be potentially used as part of the replication pipeline.   In order for these to execute properly though the catalog, schema, and table information configured must match as the batch works through the pipeline.  If any of these configuration values (catalog, schema, table) do not line up with the step prior the transforms or filters will not execute.

     

    Routing

    Routing is the process that checks for newly captured changes and will determine where they need to go by assigning them to a batch.  Target catalog, schema, and table information can be set at the router level.   This is often the simplest way to change the target information for multiple tables.  Setting the target catalog and schema at the router will effect all tables that flow through this router (table routing) and allows a single configuration point.

     

    TIP: Setting target information on the router works well when you just need to change the catalog and/or schema but not the table names.   If you need to also transform table names it is better done through transforms.

      

    Extract Transforms

    These transforms fire on the source if the source catalog, schema, and table match the values that came out of routing.   There are 3 ways to hook up an extract transform.

    • Routing values are blank for target catalog, schema, and table.   Then the extract transform would also have blank entries for source catalog and schema.  Source table on the transform could contain a table name or a wildcard (*) to hit all tables.

                  

    • Routing has the "use source catalog and schema" set.   Then the extract transform source catalog, schema, and table would need to match the values in the trigger (source catalog, schema, and table)

                 

    • Routing directly sets any of the values for catalog, schema, or table.   Then the extract transform source catalog and schema would need to match these values set on the router. 

                 

    Load Transforms 

    These transforms fire on the target side if there is a match.   There are 2 ways to hook up a load transform

    • When an extract transform is present.   Then the extract transform target values need to match the load transforms source values.

            

    • No extract transform present.   Then the load transform would hook in just like and extract transform following one of the 3 router combinations explained above for the extract transforms. 

    Load Filters

    Load filters must provide a target catalog, schema, and table combination that matches the upstream settings in the pipeline in order to execute.   Load filters can provide a variety of customizations on the data once executed.   Below is an example of how you could set the target catalog and schema on the target side rather than through a router or transforms.   This might be useful if only the target knows where their data should be loaded into (remember routing occurs on the source).   Through the use of a wildcard (*) as the target table a single load filter can support all tables rather than setting up a transform for each table.

     

    Example: Beanshell load filter "Before Write" to set the catalog and schema.

    // Check that the table is not a SYM_* runtime table 
    if (!table.getName().startsWith(engine.getTablePrefix())) { table.setCatalog("catalog1"
    );
    table.setSchema("dbo"
    );
    }
    return true;


    Example : Similar to one above but the values now are dynamic and could be set in each targets engine properties file (target.catalog and target.schema).

    // Check that the table is not a SYM_* runtime table 
    if (!table.getName().startsWith(engine.getTablePrefix())) { table.setCatalog(
    engine.getParameterService().getString("target.catalog"));
    table.setSchema(
    engine.getParameterService().getString("target.schema"));
    }
    return true;

  • Synchronizing Android Applications - Part 2

    This is the second part of a series on synchronizing Android SQLite databases using SymmetricDS. The last article talked about how to embed the SymmetricDS Android client into an Android application. This article will provide an overview of the central server and how it can be configured to synchronize the sample Notepad database.

    Android Robot

    Lets go ahead and setup the SymmetricDS server. Download the latest version of SymmetricDS Pro from the SymmetricDS downloads page.

    The download is a jar installer. In most cases double clicking on the jar file will launch the installer. If the jar file type is not associated with Java on your system then you might need to launch the installer using the following command:

    java -jar symmetric-pro-X.X.X-setup.jar

    Walk through the install process. At the end of the setup, you'll have SymmetricDS installed on your system. Choose to open up the SymmetricDS Pro Control Center.

    To run the server, if it is not already running, click the Start Server button. Once it is running, click Open Web Console.

    Lets now set up the Master Node in SymmetricDS through the Node Setup screen. Choose your database and enter the username and password. For this example, I will use an H2 database. Choose the Standard 2 Tier Configuration and click Next. For this example, the default Sync Url should be sufficient.

    Now we need to create our NOTES table in SymmetricDS. Navigate to the Explore tab and paste the following SQL:

    CREATE TABLE NOTES ( _ID VARCHAR(50) PRIMARY KEY, TITLE TEXT, NOTE TEXT, CREATED VARCHAR(50), MODIFIED VARCHAR(50), );

    Run this as a script.

    Now it is time to set up your trigger. Navigate to Configure->Table Triggers. Click Auto Create... Choose the default Channel and select the NOTES table we created earlier. Highlight the new trigger and click Link. Choose the direction(s) you wish to link SymmetricDS and your Android Device. For this example, I will link it both ways to enable synchronization in both directions. Save your choice(s).

    We need to create Transforms for our CREATED and MODIFIED Columns. This is because these fields need to be converted into SQL timestamps. First, lets set up our NOTES_INBOUND transform. Copy the information from the picture below.

    Double click the NOTES_INBOUND transform. Create Columns for _ID, TITLE, and NOTE like the picture below.

    Create Columns for CREATED and MODIFIED. Both the CREATED and MODIFIED transforms require a bsh script as their Type.

    For CREATED, enter the following script as your Expression:

    return new java.sql.Timestamp(java.lang.Long.parseLong(CREATED)).toString();

    For MODIFIED, enter the following script as your Expression:

    return new java.sql.Timestamp(java.lang.Long.parseLong(MODIFIED)).toString();

    Next, lets set up our NOTES_OUTBOUND transform. Again, copy the information from the picture below.

    Double click the NOTES_OUTBOUND transform. Create Columns for _ID, TITLE, and NOTE like the picture below.

    Create Columns for CREATED and MODIFIED. Both the CREATED and MODIFIED transforms require a bsh script as their Type.

    For CREATED, enter the following script as your Expression:

    return java.lang.Long.toString(new java.text.SimpleDateFormat("yyyy-MM-dd:hh:mm:ss.S").parse(CREATED).getTime());

    For MODIFIED, enter the following script as your Expression:

    return java.lang.Long.toString(new java.text.SimpleDateFormat("yyyy-MM-dd:hh:mm:ss.S").parse(MODIFIED).getTime());

    Go back to Eclipse. We need to set up an Android Emulator. This can be done by opening the Android Virtual Device Manager. Click New and follow the steps. The higher the Emulator's API, the better.

    Run your NotePad project by pressing Run on NotePadProvider.java in Eclipse. When prompted, select the emulator you just created. Monitor the Console in Eclipse. Let the NotePad.apk install on the emulator. Now watch the LogCat and wait as it attempts to register with your SymmetricDS Master Node.

    Go back to SymmetricDS and navigate to Manage->Nodes. Right-click the android-simulator registration request and click Allow. Check the Send Initial Load box and Apply.

    Go back to your Android Emulator and create a new note. When you are done, save the note.

    Go back to SymmetricDS and navigate to the Explore tab. Enter the following SQL:

    SELECT * FROM NOTES;

    Run this as sql under cursor. The note you created in your Android Emulator should now show up in SymmetricDS as the result.

    Additional Tips: To support multiple NotePad clients, add column to the NOTES table by entering the following SQL:

    ALTER TABLE NOTES ADD SOURCE_ID INTEGER;

    Now, go back to your transforms and double-click on NOTES_INBOUND. Create a Column like the one below.

    Save your changes.

    Congratulations! You have now synchronized your Android Device with SymmetricDS.