Streaming ETL is the movement and processing of data from source systems to target systems in real-time as changes occur. ETL is an acronym for the common phases used during data movement:

Extract
Raw changes are collected from disparate data sources and placed in a pipeline for processing.
Transform
Business rules are applied to the data, such as transposing values, aggregating data, and cleaning up the format.
Load
Data is stored in a data warehouse or non-relational database, or published to a streaming platform.

With streaming ETL, small groups of changes are processed on demand. The advantages are that data becomes immediately available, it is processed with fewer resources, and it improves overall system uptime. Instead of a single large extraction, data arrives in a continuous stream to be processed. These bursts of data are quicker to process throughout the day without impacting interactive users of the source systems. If any error occurs, it can be handled immediately by IT staff during normal working hours, giving them a larger window of time for support. The disadvantage of streaming ETL is the increased complexity, which can be overcome with a dependable data integration platform.
Streaming ETL

What is Traditional ETL?

Traditional ETL uses batch processing to collect a large amount of data in a single scheduled run. It is common to completely refresh all data with a "kill and fill" operation. The jobs are usually scheduled for overnight, when systems are less busy and the heavy processing will not slow down interactive users. The advantage of ETL jobs is that batch processing is well understood and easy to implement. Disadvantages are delayed access to data and supporting any errors in the middle of the night.
Traditional ETL

When to use Streaming ETL

Streaming ETL should be used in cases that have the biggest impact to business, such as the following areas:

  • Core to Business - Data that is critical to business operations, such as sales data for retailers, operations data for manufacturers, or financial data for banking
  • Customer Service - Access to customer information to address any service inquires
  • Ecommerce - A website with order information, payment authorization, and shipping status
  • Internet of Things - Automation in response to thousands of data points from sensors and control systems

Streaming ETL is growing in popularity for offering new kinds of services and solving business cases that serve customers better. As new solutions are built, and old ones are updated, IT departments are relying on streaming ETL to process data and events as they occur. The advantage is service offerings that outperform competitors and provide real-time insights into how customers are behaving.

Make Streaming ETL Easy

Remove the obstacles to real-time data pipelines by leveraging a streaming ETL platform like SymmetricDS. Easily integrate data using continuous change data capture, so you can focus on your application and analytics. SymmetricDS is a cross-platform solution that can sync any database to any database, including non-relational and streaming platforms like Kafka, Elasticsearch, and Snowflake. It's easy to set up replication and scale to thousands of databases using a powerful web console to design the integrations.

 

 

Eric Long
Author: Eric Long

Eric is a software developer and technology enthusiast with a background in developing custom applications for Information Technology. As a long-time Linux user, he strongly believes in open source. He focuses on product solutions and spreading the word for JumpMind.