data integration

  • Bridging Databases with Different Dependencies

    Bridge data between databases with data synchronization by using channels to re-order data when the dependencies in each database are different. Each application can model its data differently, so a parent row in one database might map to a child row in another one. Instead of using data transformation to lookup and add dependent data, the order of captured data can be changed to match the constraints of the target database.

    Let's look at two applications that persist customer and address information in their databases. A website database uses the model of "a Customer has an Address", while a call center application uses "a Premise has a Contact". We'd like to synchronize the data between the two databases with Customer to Contact and Address to Premise, but the dependencies don't match on each side. As shown below, you can see the dependency of data is reversed between the two applications. On the website, an Address is created first, followed by a Customer. On the call center, the reverse order must be performed: a Contact is created first, followed by a Premise.

    Bridge data between two different data models

    Mapping the tables for data synchronization is difficult when the system loads data the same way it was captured and the database enforces constraints. If Customer is mapped to Contact, and Address is mapped to Premise, the problem is that the data will arrive out of order for the target database. The website will save an Address and a Customer which are sent to call center. When the Address row is mapped and loaded into Premise on the call center, the database will throw a foreign key constraint because the dependent data for Contact hasn't been loaded yet.

    ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`callcenter`.`premise`, CONSTRAINT `premise_ibfk_1` FOREIGN KEY (`contact_id`) REFERENCES `contact` (`contact_id`))

    Using Channels to Satisfy Dependencies

    We can solve this problem by assigning the tables to channels with different priorities. Channels allow data that would normally be in the same batch together to be split into separate batches and loaded independently. Data is loaded from channels in the order specified by priority, so high priority channels load first. For example, we could create a channel named "parent" with a priority of 10 and a channel named "child" with a priority of 50.

    How using channels changes batching

    If we assign the Address table to the "child" channel and the Customer table to the "parent" channel, the rows are placed in separate batches, with the Customer data loading first. With the data in the correct order for the target database, now we can map columns and transform data to load Contact and Premise.

    Conclusion

    SymmetricDS can integrate data between two applications through data synchronization and transformation. We looked at how dependencies like foreign key constraints can make it challenging to map tables and columns between different databases. Channels provide a way to separate transactional data into batches that can be ordered and loaded independently, making the next step of transformation easier.

  • Creating Custom Metl Components

    Metl includes a prepackaged component library for everyday use but some users may require unique functionality. This can easily be achieved with custom components. In this article I will walk through the steps required to create your own custom Metl component

    1.0 Overview

    Metl is an open source, lightweight, web based integration tool that supports the creation of custom components. Metl comes with a large set of prepackaged components, however, if a unique need arises, custom components can be created to implement specialized needs. Building custom components is very straight forward. The basic steps are as follows:

    • Create a Java Project – The java project will contain the source code and related files for the custom component
    • Create the Component Definition – The component definition defines the basic aspects of the component including the name, description, type, source file, message and resource types, settings, etc.
    • Create the Component User Interface Definition – The user interface definition defines aspects of the user interface including icons, and if needed, custom UI java classes.
    • Package the component – Files in the java project are packaged into .jar files or .zip files for deployment.
    • Deploy the Component – Provide the packaged component to Metl via uploading it directly to the running Metl instance or loading it into a maven repository accessible to that Metl instance.

    This will describe the steps required to create a custom component that uses Pushover, a cross platform notification application capable of varying levels of priority. The goal of this component was to integrate with our support center platform to send an emergency notification in the result of a missed support call. The flow utilized several standard Metl components but this will focus on the development steps for the custom Pushover component.

    2.0 Creating the Java Project

    The Metl engine and its prepackaged projects are built in Java. To create the custom component, first create a new Java project to hold the source code and related files. The project needs a dependency on the metl-core library which is available from the JumpMind maven repository here. The project structure for this custom component is shown below.

    The project requires the following parts:

    • plugin.xml – Used to create the component definition.
    • ui.xml – used to create the UI definition.
    • One or more java class(es) – Used to implement the component methods.

    3.0 Creating the Component Definition

    The component definition is created by writing a file called plugin.xml. This xml file needs to conform to the components.xsd file which can be found here. This file determines parameters such as the component name, and component settings. The basic component definition is defined as such:

    <component category='PROCESSOR'
                 id='Pushover'
                 inputMessageType='any'
                 inputOutputModelsMatch='false'
                 outputMessageType='any'
                 supportsMultipleThreads='false'
                 resourceCategory='none'>
        <name>Pushover Message</name>
        <className>org.jumpmind.metl.core.runtime.component.Pushover</className>
        <keywords>message,notification</keywords>
        <description></description>
    

    The id defines the component name. The input and output message type can be set to one of four values, one of the three message types, or any. In Metl data flows between components through a series of messages that are generated and consumed by each component. The three message types are as follows:

    • Model Based Message
    • Text Based Message
    • Binary Based Message

    In this component, we chose to allow any input and output message type to keep the component generic and flexible.

    Settings for the custom component are also defined in the plugin.xml file. A small sample of some of the settings configured are shown below:

    <settings>
        <setting id='run.when'
                   required='false'
                   type='choice'>
            <name>Run When</name>
            <defaultValue>PER UNIT OF WORK</defaultValue>
            <choices>
              <choice>PER UNIT OF WORK</choice>
              <choice>PER MESSAGE</choice>
            </choices>
          </setting>
          <setting id='token'
                   required='true'
                   type='text'>
            <name>Token</name>
          </setting>
          <setting id='user'
                   required='true'
                   type='text'>
            <name>User Key</name>
          </setting>
          <setting id='message'
                   required='false'
                   type='multiline_text'>
            <name>Message</name>
            <defaultValue>Message</defaultValue>
          </setting>      
          <setting id='device'
                   required='false'
                   type='text'>
            <name>device</name>
          </setting>
          <setting id='title'
                   required='false'
                   type='text'>
            <name>Title</name>
          </setting>
    <settings>

    Each setting consists of an id, required field, and a type. Initializing required to true in the plugin file will make that setting necessary to run the component. The three setting types used in this component are choice, text,and multiline_text. The type of setting will determine how said setting takes input in the Metl interface.

    The choice type utilizes a drop-down menu to allow users to choose from a limited set of values defined in the plugin.xml file.

    In this case the choice is for a sound chosen from a set list given by pushover to use when delivering a message.

    The other two setting types used were text and multiline text.Both settings are represented in Metl by an editable text field. The choice between the two will depend on the use case.

    In this component both Token and User Key are represented as text fields since they are both relatively short strings generated by Pushover. Message is a multiline text field allowing for longer inputs. The red asterisk next to a setting indicates that it is a required field.

    4.0 Creating the Component UI Definition

    The UI definition is created in a file called ui.xml. In this simple component, we chose to use the standard Metl component UI but a custom UI can also be created.

    <ui>
        <component-ui id="Pushover Message" componentId="Pushover">
            <iconImage>org/jumpmind/metl/core/runtime/component/Pushover.png</iconImage>
        </component-ui>
    <ui>
    

    5.0 Implement the Component

    The next step in creating the component is implementing the required methods in the component's Java class. In this example, the class implemented the IComponentRuntime and extended the AbstractComponentRuntime class.

    The complete list of defined methods is available at the link above but for this simple component we only implemented two.

    • Start – The method executed when the flow is started.
    • Handle – The method executed when the component is given an input message.

    This start method created variables defined in the plugin.xml file and initialized them based on values input by the user in the Metl console.

    @Override
        public void start() {
            Component component = getComponent();
            runWhen = properties.get(RUN_WHEN, PER_MESSAGE);
            token = component.get(TOKEN);
            user = component.get(USER);
            message = component.get(MESSAGE);
            device = component.get(DEVICE);
            title = component.get(TITLE);
            url = component.get(URL);
            urlTitle = component.get(URL_TITLE);
            priority = component.get(PRIORITY);
            if (priority.equals("2")) {
            	retry = component.get(RETRY);
            	expire = component.get(EXPIRE);
            }
            if (!priority.equals("-2") || !priority.equals("-1")){
            	sound = component.get(SOUND);
            }
        }
    

    The next step is implementing the handle method. The handle method defines the method’s functionality and handles the different types of input messages that could be given by Metl.

    @Override
    	public void handle(Message inputMessage, ISendMessageCallback callback, boolean unitOfWorkBoundaryReached) {
    		
    		if (inputMessage instanceof ControlMessage && PER_UNIT_OF_WORK.equals(runWhen)) {
    			if (message.isEmpty()) {
    				throw new MisconfiguredException("A message must be provided for the pushover notification");				
    			}
    			pushoverNotification();
    			if (outboundPayload.size() > 0) {
    				callback.sendTextMessage(inputMessage.getHeader(), outboundPayload);
    			}
    		} else if (inputMessage instanceof TextMessage && PER_MESSAGE.equals(runWhen)) {
    			TextMessage msg = (TextMessage)inputMessage;
    			if (msg.getPayload().size() > 0) {
    				for (int i = 0; i < msg.getPayload().size(); i++) {
    					message = msg.getPayload().get(i);
    					pushoverNotification();
    				}
    				if (outboundPayload.size() > 0) {
    					callback.sendTextMessage(inputMessage.getHeader(), outboundPayload);
    				}
    			}
    		}
    	}

    For this component, we wanted the ability to send a message created or retrieved by other Metl components so we utilized the TextMessage input message type, setting the message variable created in the start method to the value of the input message if applicable.

    6.0 Package and Deploy the Component

    Once the component is ready for deployment it’s time to upload it to your Metl instance. Since this component was packaged into one .jar file, we will upload it directly into a running Metl instance for simplicity. To upload the component directly to the running Metl instance navigate to the Admin tab and select Plugins then click the Add button

    In the popup window click on the Upload tab and fill in the required fields with the Group, Component name, and Software Version.

    Click the upload button and navigate to the components class file. Once the class file has been uploaded, click Add to finish uploading the custom component to the Metl instance.

    With your custom component now deployed in Metl, you can add it to flows and begin benefitting from its unique functionality.

  • Data Normalization using Transformations

    When integrating data from one database to another, sometimes the schemas are not the same. JumpMind has come across a number of projects where there was a need to transform a very denormalized database into more of a canonical or normal form.

    Sometimes normalization is needed as part of a migration path to a newer and better data model.

    Sometimes normalization is required just to feed another system.

    Whatever the reason, SymmetricDS can:

    • Capture the data change at the source database
    • Transform it
    • Transport it (across the across the web or just across your data center)
    • Transform it more (if needed)
    • Load it into your target database

    We will take a look at a simple example to demonstrate how this can be accomplished. We will transform an AppUser table that contains name, role and password information into three tables at the target: user, user_role, and user_password.

    For the purposes of this article, we will use embedded H2 databases that will be created automatically for you.

    In SymmetricDS, each database that is participating in synchronization is represented by a SymmetricDS node. Nodes belong to groups that are linked. If you are not already familiar with SymmetricDS it might make sense to read an overview of SymmetricDS.

    Do the following to get SymmetricDS installed and ready to go:

    Each properties file represent a node. In this case both nodes are hosted in the same SymmetricDS installation. This doesn't have to be the case. For example, in a cloud deployment you might have a SymmetricDS installation in your datacenter that communicates over HTTP with a SymmetricDS installation hosted in the cloud.

    Open the web console at http://localhost:31415/app. You are viewing the source node. You can switch the context to the target node by selecting it in the dropdown at the upper right corner.

    The tables structure we discussed has been created for you automatically. If you select the Explore tab, you will be able to view the tables we will be working with in the database explorer. The source tables have been created in the TEST schema. The target tables have been created in the USERS schema.

    Now go the Configure > Transforms screen. Select “Auto Create”. The “Implied” option means that all columns from the original table will be passed through. We want to explicitly specify the column mapping ourselves, so select “Specified”.

    Now Edit the transform that was created. Set the “Target Schema” to “Users” because this is where our target tables reside. Set the “Target Table” to “User”. Save the transform.

    Now go ahead and “Auto Create” the mapping from “AppUser” to “User_Password” and “User_Role”. Note, in order for the tables to show up in the “Auto Create” dialog you will need to select “Show tables with transforms”.

    Next we need to setup the column mappings.

    Select the first transform and press “Edit Columns”. You will need to add three columns. The user_id to id transform can be a copy transform. Be sure to mark it as the PK.

    The next two transforms will be bsh transforms. We will map the source name column to a first_name and last_name column in the target table. We will look for the first space in the name column. Everything before the space is the first_name. Everything after the space is the last_name.

    This is the bsh script for first_name:

    if (NAME.contains(" ")) { return NAME.substring(0, NAME.indexOf(" ")); } else { return NAME; }

    This is the bsh script for last_name:

    if (NAME.contains(" ")) { return NAME.substring(NAME.indexOf(" ")); } else { return NAME; }

    Now configure the column mappings for “User_Password” and “User_Role”.

    Make sure you have saved and closed all of your edit tabs. It's not time to test the transform. Insert the following SQL statements at the source. Inspect the target tables and experience the normalization of your data!

    insert into test.AppUser values(1, 'Billy Smith', 'user', '$%^#$@#$%^', null); insert into test.AppUser values(2, 'Willy Smith', 'user', '$%^#$@#$%^', null); insert into test.AppUser values(3, 'Wilma Smith', 'user', '$%^#$@#$%^', null);

     

  • Designing Your Synchronization Scenario

    Over the last several years, we've designed, implemented, tested, and supported hundreds of synchronization scenarios. We've recently updated our design document templates that we use to get projects started the right way.

    The complexity of synchronization scenarios we've encountered varies drastically depending on many factors including:

    • Number of Databases / File systems (Nodes) Being Synchronized
    • Transaction Volumes
    • Uni-Directional or Bi-Directional Synchronization Requirements
    • Conflict Detection and Resolution Requirements
    • Schema Differences and Data Transformation Requirements
    • Dynamic Data Routing Requirements
    • Initial and Reverse Initial Loading Requirements
    • And Many Other Factors

    The good news is, it's all very doable. We've implemented solutions from 2 Nodes to 50,000 Nodes, from small transaction volumes to huge transaction volumes, from identical schemas to complete transformations between normalized and denormalized schemas. Two critical pieces to a smooth implementation are the right tool, and a good design, done upfront.

    The tough part of a good design is that unless you've "been there, and done that", it can be hard to know what the critical design elements are, and what questions you should be asking. That's where talking with experts in the field and utilizing design document templates can certainly help. We've recently updated our SymmetricDS Synchronization Design Document template that we use as part of all of our JumpStart consulting engagements. It's a great template to help customers think through their scenario, rate its complexity, and begin implementation on solid footing.

    If you are starting on a synchronization project, we'd be happy to send you the design templates we work with in order to get you started. Just send us a quick note from the Contact Us form with a little background on your synchronization scenario and we'll email it your way.

     

    "Luck is the residue of design."

    - Branch Rickey 

     

  • Publishing Messages from Data Synchronization

    More and more JumpMind is enhancing SymmetricDS with new data integration features that leverage SymmetricDS’s Change Data Capture capabilities for replication, synchronization, and transformation of data from one system to another. In this article, we will cover how to use the data replication server to map data into a JMS message destination, which could be part of an enterprise strategy for SOA Data Integration and Messaging.

    Within the next month we be releasing a simple and flexible data mapping and transformation feature in 2.4. Later, towards the end of the summer, we will be making a foray in the mobile world with a new 3.0 client that runs on Android devices.

    While we are hard at work expanding the feature set of SymmetricDS, one of the hardened mainstays that has been very handy for integrating with other systems has been our messaging extension point.

    Out the box, an extension point can be configured in XML to publish an XML representation to any number of messaging providers. A JMS (Java Message Service) connector can be used via configuration. Other messaging subsystems can be published to by implementing a very simple interface that receives the message to be published. If you have a messaging system that you would like to integrate SymmetricDS with, contact us and we might just add support.

    Messages can be published from two different points in the SymmetricDS architecture.

    1 - Messages can be published from the source node, after data is captured in the process of routing the data. You might want to use this scenario if you want to use SymmetricDS solely as a tool to capture and publish changes.

    2 - Messages can also be published at the target node during the process of data loading. You might want to use this scenario if you are trickling data from a multitude of remote nodes into one central repository and you need to integrate data to other systems.

    More information about messages using JMS can be found in the SymmetricDS Users Guide.

  • Quick Start Guide to Using Metl

    Are you looking for a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services that is easy to use and quick to setup?

    In this blog I'm going to walk through the steps to install and run your first flow in Metl in a Windows environment.

    Installation

    For this example, we will use the simplest way to run Metl which is to run it stand-alone using the internal Jetty container.

    1. Download the metl.war file, make sure a Java 8 run-time environment is installed and in the path.

    2. In order to run Metl, simply run the following from a command prompt:

    java -jar metl.war

    3. Once Metl is started, open a web browser and navigate to:

    http://localhost:42000/metl/app

    Now you can begin to build and execute Metl flows.

    Run a Flow

    There are several sample flows already created and ready to run. Here I'm going to run the Flat File to Relational Database flow.

    1. Navigate to the Design tab and expand the project Samples followed by the Flows folder.

    2. Double click the name of the flow 'Flat File to Relational Database'.

    Your screen should look similar to the image below.

    This screen shows the flow components and the order of process. The lower window displays properties of a selected component. Some components have additional options that are found by double clicking the component.

    3. Click the 'Run' button and your screen will look like the following:

    This screen shows run-time details and various logging outputs. As the flow executes you will see the status change on each step. Clicking on a Component Name row in the upper right window will display the log results (if applicable) in the lower window.

    That's it, you have installed and run your first Metl flow.

     

    For additional documentation on Metl see:

    https://github.com/JumpMind/metl/wiki

     

     

     

  • When Should You Use Real-Time Data Integration?

    Real-time data integration is being increasingly used by companies to improve their business and better serve their customers. Traditional batch systems are being converted to real-time systems that process transactions as they occur. The advantages include improved knowledge of how business is performing and offering services that outperform competitors. An architecture with real-time information and on-demand processing can have a huge payback for certain business scenarios.

    Batch Processing

    In batch processing, a large group of transactions are collected and the data is processed by a program during a single run. With the large volume of work to do, the process must run when resources are less busy, usually overnight. Programmers have traditionally written programs for batch processing, so it's well understood and easy to implement. But it postpones access to data, requires close supervision, and risks down time. You miss out on the knowledge of your business until after it's happened. Because any problem can delay the entire process, you need support personnel to closely monitor while it runs. And if there's a problem, you risk even further delay of data and some systems may not operate correctly for users.

    Real-Time Processing

    In real-time processing, small groups of transactions are processed on demand. Instead of a single run, there is data that arrives continuously to be processed by a program. The load of processing data is spread across the day when transactions occur. The advantage of real-time processing is that it increases access to data, runs with fewer resources, and improves uptime. With continuous integration of data, you have knowledge of your business as transactions occur. With small groups of transactions to process, it can be run periodically during the day without impacting interactive users. If an error occurs, it can be handled immediately by staff who are already on site, and the system has a larger window of time to recover. It is more difficult to design real-time processing and it includes some additional overhead, so not every system should be architected as real-time.

    When to Use Real-Time

    While systems with real-time processing require more effort to design and implement, the benefit to your business can be huge. Let's look at the areas where real-time processing has the biggest impact.

    Core to Business

    Data that is critical to business operations should be real-time because it is your most valuable data. You want to gain every advantage over competitors at your core. For instance, a retailer should have immediate access at central office to their sales data from stores. Once the core data is real-time, other systems can make use of it and new systems become possible. For example, the retailer runs a promotion and watches sales performance as customers respond to the incentive. The response can be analyzed in real-time and the promotion can be adjusted instantly to maximize sales. Inventory systems can be updated real-time, making it possible to check inventory by location from the call center or website. Loyalty customers can have receipts emailed to them immediately after purchase. New aspects of your business open up that weren't there before.

    Research shows the trend is towards real-time data. In a study of 21 large companies across a range of industries, including banking, finance, and retail, research firm Market Clarity found that most of the companies had moved to delivering real-time information feeds. The report found that applications requiring real-time access to data were mainly core to their business.

    Supporting Customers Online

    Customer self-service and ecommerce websites should be real-time because it's the best way to service customers who've come to expect it. Gone are the days of waiting to check your balance or see a transaction the next day. Now, you get an alert when a spouse spends too much at the department store. You place an order and receive email updates within minutes as the status changes from received to payment authorized to shipped. A website with real-time data has an advantage that attracts customers in search of instant gratification. Batch processing doesn't make as much sense on the web where access is available all the time from anywhere.

    Enabling Business

    We've seen that real-time integration keeps your business competitive and enables new kinds of services. Most companies are incorporating real-time integration into their core systems where it has the most impact. On the web, customers expect self-service and ecommerce sites to have interaction with real-time data. As IT managers look at strategies for real-time data integration, they will consider data replication, messaging, web services, and event-driven architectures. As competition increases and customers expect more, we'll see these strategies deployed more often.