When comparing database replication tools, some solutions use physical replication, while others use logical replication. Let’s examine the differences between these two approaches to understand their strengths and limitations.
Physical replication uses the disk files of the database and essentially copies them byte-for-byte. A change made on the source database is streamed to the target database, making the same change to the corresponding disk block. Generally, this operation is synchronous, meaning that the source database waits until the target database is updated before committing. This is the best form of replication for disaster recovery since zero data loss is guaranteed. The other big advantage of physical replication is its low overhead. The database already has to write into a write-ahead log, and physical replication is simply streaming that to the target database.
However, physical replication has a number of disadvantages:
- Must be a homogeneous environment, using the same database platform and version.
- Must be located nearby to avoid performance problems with waiting on the 2-phase commit.
- Every part of the database is replicated, without any exclusions.
- Higher network usage, since all database operations (including index writes) are replicated.
- Target database must be read-only.
Logical replication uses the database’s write-ahead log, triggers, or other change tracking methods to capture logical changes made to rows in tables. Generally, this operation is asynchronous, meaning that the source database commits its data and replication takes place afterward. This is the best form of replication for cross-platform data integration, data migrations, and data aggregation.
Logical replication can work around the limitations of physical replication:
- Works in a heterogeneous environment across different database platforms and versions.
- Works across wide-area networks, low-bandwidth networks, and tolerates outages.
- Filter tables, columns, and rows to replicate, and route them to different databases.
- Lower network usage, since only filtered changes are replicated.
- Target database can be writable with bi-directional replication.
Logical replication is very flexible and supports many use cases. It can consolidate data from disparate data sources into a data warehouse or data mart. Or, it can distribute data across many remote locations. Working across wide-area networks, it can synchronize data across geographical locations for high availability. Using conflict detection and resolution, it can support active/active replication and keep data integrity even when changes occur for the same data on multiple databases. As a cross-platform solution, it can replicate to non-relational targets, like publishing changes to messaging platforms and cloud-based data stores.
SymmetricDS is an example of a logical replication tool with flexible configuration and cross-platform support for more than 40 database platforms. It can distribute, filter, and transform data in near real-time across wide-area networks and withstand periods of outage. Optimized for performance and scalability, it can scale out to thousands of databases and replicate across multiple tiers. With compatibility for more than 40 databases, it can sync from any database to any database, including relational, data warehouses, NoSQL, streaming platforms, mobile databases, and cloud databases.
Physical and logical replication both have their place in an enterprise data integration strategy. Physical replication is best suited to disaster recovery and cases where low overhead is paramount. Logical replication is best suited to cross-platform solutions with data integration and bi-directional replication. Taking advantage of both solutions will help a business secure their data and leverage it to make smarter decisions.