As companies implement data integration across their enterprise, it can often lead to a data tragedy, whether that is complexity, high cost, outdated information, or even lost data. Let’s take a look at some common data tragedies and how data replication can help you avoid them.

Sending All Data Everywhere

Some businesses must deal with scaling out and consolidating data from multiple endpoints. Think retail chain stores, medical devices at hospitals, kiosks at amusement parks, and laptops for field representatives. Common problems are outdated data from using batch processes and compliance issues with regulations like PCI-DSS and HIPAA that require data privacy and protection.

But at this scale, the easiest mistake to make is sending all data everywhere. It’s easy to turn on data replication for all tables in the database and send it to all endpoints, but this approach is inefficient and leads to problems. First, it increases cost for all the extra bandwidth being used. If the business is lucky enough to use a local network, then the cost may just be in additional network equipment. But for a business with remote devices, it usually pays for the amount of data transferred over the wide area network through leased lines, cable, cellular, or satellite. Secondly, the solution may work fine when the system is in light use, but a surge in business or a network outage can bring the system to a crawl with a backlog as it struggles to catch up.

By taking some time to understand the data, replication can be configured to be more efficient, resulting in a faster system that saves money. While some master data must be sent to all endpoints, there is often data that can be sent as subsets. For example, retail stores can be sent the subset of items, prices, and tax rules that apply to that location. There are often tables or columns that should not be replicated. For example, some software packages use a table to store a heartbeat from each workstation. Replicating this table across all offices results in lots of traffic that has nothing to do with the core data needed by the application. In other cases, certain columns should be excluded from replication. For example, point of sale software often provide an import of master items that will update every item in the database regardless of whether it actually changed or not. So, every night, all items are loaded and sent to all store locations, which creates high bandwidth costs and stress on the system. By excluding the “last modified” column from replication for the table, only the items that really changed are replicated.

Downtime for Data Migration

There are times when a data migration is needed, whether its moving from an on-premise to cloud-based database, migrating to a new database platform, or simply upgrading systems. Common migration problems are cross-platform compatibility issues, like data type and character set issues, and performance of the migration, which can involve a large amount of data. But given how critical information is to many businesses, the biggest data tragedy is the downtime.

A typical game plan involves pulling an all-nighter, kicking all users out of production, watching data migrate, then pointing applications over to the new database after it’s verified. The downtime followed by a cut-over to the new system is risky. Not only is the system unavailable during migration, but any problems that arise could delay access further. Once users can access the new system and change data, there’s no going back to the old system, which could represent serious risk.

By leveraging some advanced features of data replication, a data migration can eliminate downtime with either a direct conversion, parallel adoption, or phased roll-out method. A direct conversion keeps the old production active during migration, then cuts over when the new system becomes active in a “big bang” adoption. By using snapshot replication of the initial data set, followed by change data capture, the system can have zero downtime.

During one system migration, a customer discovered that large object (LOB) data was truncated at 4000 bytes. To remedy, the data replication software was used to send a partial load to fill in the missing the data. Another customer had to roll-back their migration and make a second attempt, which increased their downtime.

A better method to reduce risk is parallel adoption, which keeps both old and new systems running. By configuring bi-directional data replication, changes made on each system are sent to the other one. Sometimes a customer will run the systems in parallel to take more time for verification and feel comfortable with stability of performance. For a system involving multiple applications, a phased roll-out builds on the parallel adoption by letting each application cut-over to the new system when ready.

Missed Insights with Timely Data

In this use case, the business needs access to information for decision making, so reports are built against operational data. Common problems are accessibility of the data that is spread across different databases and additional contention that slows down applications. But the biggest data tragedy is missing out on business insights from timely data.

Reporting solutions often run on a schedule overnight and deliver reports to an executive’s inbox for the next morning. Because the most valuable reports need to combine data in meaningful ways from multiple sources, they usually require a data integration strategy. The traditional ETL (extract, transform, load) platforms run jobs overnight for the next day’s reports, which misses the opportunity to act on information as it happens.

A data integration strategy that includes data replication can improve access to data by making it available almost immediately. Advanced data replication features can include transforming data changes as they pass through and publishing to big data pipelines for analysis. Many businesses will replicate to a data warehouse or a reporting database to offload contention on their operational databases.

In one case, a discount retail customer provided leadership with a mobile phone app showing chain-wide sales for each hour as they happened, gauging success of promotions and making adjustments quickly.

Data Tragedies Can Happen to Anyone

Data tragedies happen all the time, but a good data integration strategy will avoid the risk. Deploying data replication using the right features can have many business benefits. It can reduce the network operational cost of many endpoints. It can eliminate downtime of data migrations and reduce the risk to business. It can improve access to critical operational data for making timely decisions through reporting and business intelligence.

At JumpMind, we help companies build data integration solutions that deliver timely and meaningful information. We know the requirements and features needed to avoid a data tragedy. We use our software products to provide seamless, stable data integration for projects with short timelines and tight budgets. Bring us your data tragedy and we’ll help you with the right solution.

More on Data Tragedies: Join a live webinar on common data tragedies and how to avoid them, where our experts explain common mistakes with data integration, their impact to business, and strategies to prevent them.

Eric Long
Author: Eric Long

Eric is a software developer and technology enthusiast with a background in developing custom applications for Information Technology. As a long-time Linux user, he strongly believes in open source. He focuses on product solutions and spreading the word for JumpMind.