Stream data with SymmetricDS to Databricks

Overview

Organizations today face the challenge of consolidating data from on-premise and cloud-based systems into a unified analytics platform. SymmetricDS professional edition now supports native integration with Databricks, enabling you to replicate structured data seamlessly from any of the 30+ heterogeneous database platforms SymmetricDS supports directly into your data lakehouse. Whether you’re working with Oracle, SQL Server, PostgreSQL, MySQL, or legacy databases, SymmetricDS provides real-time, data synchronization that keeps your Databricks environment up-to-date with the latest operational data, giving you a truly unified view across your entire data ecosystem.

Connecting to Databricks

Setting up SymmetricDS to work with Databricks requires a few configuration steps to ensure performance and compatibility with the Databricks JDBC driver.

Configuring Apache Arrow Support

The Databricks JDBC driver leverages Apache Arrow for high-performance data transfer. To enable this functionality, Apache Arrow requires access to JDK internals through reflection. This is a one-time configuration that you’ll need to apply to your SymmetricDS installation. Locate the sym_service.conf file in your SymmetricDS conf directory and add the following line:
`wrapper.java.additional=–add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED`

This setting exposes the necessary Java internal APIs that Apache Arrow needs to function properly. For additional details about Apache Arrow’s requirements, refer to the official Apache Arrow documentation at https://arrow.apache.org/docs/java/install.html.

Configuring Your Databricks Target

When setting up your Databricks target endpoint on the SymmetricDS Canvas interface, navigate to the Custom tab where you’ll find a pre-templated Databricks connection string. You can customize this template by updating the Databricks cloud instance, schema, and HTTP path fields individually, or you can overwrite it entirely with your complete JDBC URL.
For authentication, Databricks typically uses Personal Access Tokens (PAT). When configuring your connection with a PAT, enter “token” in the User ID field, then provide your actual token value in the Password field. This authentication method ensures secure, token-based access to your Databricks workspace.

You’ll notice that the Databricks endpoint is configured as a “write-only” capture type. This is by design — SymmetricDS maintains its own metadata and synchronization tables on a separate H2 runtime database rather than creating these tables in your Databricks environment. This architecture keeps your lakehouse clean and optimized for analytics while giving SymmetricDS the operational database it needs for managing the replication process.

Built for Reliability: Handling Intermittent Connectivity

One of SymmetricDS’s most powerful capabilities is its fault-tolerant architecture, which truly shines when dealing with intermittent network connectivity -— a common challenge for organizations with edge devices, remote offices, or distributed operational systems.

When a connection to Databricks is temporarily unavailable, SymmetricDS doesn’t lose data or fail catastrophically. Instead, it intelligently queues changes at the source database and automatically resumes synchronization once connectivity is restored. This queue-and-forward mechanism ensures that every transaction is captured and eventually delivered to your lakehouse, maintaining complete data integrity even through network disruptions.

This capability sets SymmetricDS apart from many cloud-native integration tools that assume constant connectivity and may require complex retry logic or external orchestration to handle outages gracefully. Whether you’re dealing with unreliable network connections at retail locations, manufacturing plants, or field operations, SymmetricDS provides enterprise-grade reliability that keeps your data flowing to Databricks without manual intervention or data loss.

Flexible Data Transformations

SymmetricDS provides powerful transformation capabilities that let you modify, enrich, mask, or filter data during the replication process. You can apply business rules, mask sensitive information for compliance, denormalize data structures for analytics, or filter out unnecessary records, all without building separate ETL pipelines. These transformations happen seamlessly as part of the replication process, reducing latency and infrastructure complexity.

Lightweight Deployment Architecture

One of SymmetricDS’s key advantages is its simple, lightweight deployment model. Unlike complex data integration platforms that require extensive infrastructure—separate orchestration layers, message queues, transformation clusters, and monitoring systems—SymmetricDS operates as a self-contained agent that can be deployed directly alongside your databases. This architecture minimizes infrastructure costs, reduces operational complexity, and makes it easy to scale horizontally by simply adding more agents as your data volume grows.

Conclusion: Empowering Data-Driven Organizations

SymmetricDS’s integration with Databricks represents a powerful solution for organizations seeking to modernize their data infrastructure while maintaining reliability, flexibility, and cost efficiency.

The Business Value

The benefits of implementing SymmetricDS for Databricks integration extends across your entire organization with:

Improved Data Accuracy: Databases across your organization maintain consistent, up-to-date data, eliminating discrepancies and enabling confident decision-making.
Unmatched Heterogeneous Database Support – SymmetricDS supports over 30+ database platforms for sourcing your data to Databricks. This comprehensive platform coverage is a key differentiator that allows organizations with complex, multi-vendor database environments to consolidate their entire data integration strategy around a single proven solution for feeding Databricks or another platform.
Enhanced Data Accessibility: Teams can access critical information from different points in your network without complex data requests or delays.
Increased Operational Efficiency: Eliminate manual data entry, custom scripts, and time-consuming data migrations that drain IT resources.
Scalability and Flexibility: Add or remove databases as your business evolves without disrupting your existing data processes or requiring major reconfiguration.
Business Continuity: Ensure uninterrupted operations even in the event of database failures or system outages through automated failover and recovery.
Cost Savings: Reduce expenses associated with manual data operations, eliminate the need for additional operators dedicated to data movement, and avoid costly proprietary integration tools

Partner with Us

SymmetricDS is committed to evolving alongside your business needs. If there’s a feature or capability you’d like to see in SymmetricDS, we encourage you to reach out to our sales team. We actively prioritize customer-requested features and work closely with our clients to ensure the platform meets their unique requirements.

Ready to transform how you move data into Databricks? Contact us today to learn more about how SymmetricDS can streamline your data integration and empower your analytics initiatives.