Data Migration Considerations

Data migrations are becoming increasingly common with a variety of new platforms for data storage. The cloud provides a way to offload management and storage of data onsite to reduce risk and provide scalability. There are also many new architectures and emerging technologies like non-relational databases that provide better options for data analytics and reporting. As a result, data migrations have emerged in all industries and for companies of all sizes. 

However, such data migrations require upfront planning to become successful. In this article, several considerations will be identified to help plan a successful data migration and determine the proper tools to use in the process.

Ask Yourself:

  1. Will the data migration occur while the source is live and receiving changes? 

If the source system cannot experience any downtime or outages while the data migration is taking place, it is critical to evaluate products that can accommodate such a requirement. If the data replication tool is not prepared to accept changes and hold them while the migration is taking place, corruption of data might occur on the target.

  1. If an outage is possible, what is the maximum downtime available?

Some data migrations allow for the source system to be shut down for users so that there is a clear path to extract the data, transfer and load it to the target before allowing users back into the source. This is a clean and straightforward approach if downtime is applicable. However, even if downtime is not possible the duration of the shutdown becomes important. Proper metrics and/or estimates need to be calculated upfront to make sure the full dataset can complete before the outage window expires.

  1. Are dependencies (foreign keys) taken into consideration?

If the target platform contains foreign key relationships, will the data be loaded from the source in a parent-first approach to avoid violations? Oftentimes tables need to be evaluated by the dependencies to determine the order in which tables can be loaded. Some data tools will even disable foreign keys at the target in order to improve performance and apply them back after the data migration has been completed.

  1. Is there a bulk loader available to increase performance?

Often the performance of a data migration is critical to success, so being able to load the data as quickly as possible is a major benefit. Database vendors will often provide tools or APIs to achieve greater performance. Evaluating tools that can support these vendor-provided options or even provide some alternatives will be a factor in improving the time of the data load. Be careful, as sometimes a bulk approach comes with additional requirements, such as the target tables must be completely empty.

  1. Can the data migration be multi-threaded (parallelized)?

Depending on the number of connections available to the source, often a multi-threaded approach can also be used to increase the performance of the data migration. This would allow multiple tables to be extracted, transferred, and loaded in parallel rather than one after another in series. As in any multi-threaded process, it’s key to understand the capabilities of the machines running these processes so they are not overworked to the point that the multi-threading gains are lost.

  1. Do you know the size of your tables and the data within them?

Some database tables contain large amounts of historical data that remain unchanged and could be migrated over in phases at a later date so that the active data can be readily available to users and back online faster. For example, if a table has a year tied to it that might allow only the past year or two years only to migrate over initially. Then additional historical data can be moved over at a later point and in smaller, controlled batches. This might reduce the overall time of the data load and allow users back into the system if an outage window is required. 

Some data tools may also be able to multi-thread the migration and dedicate larger tables to their own thread allowing many smaller tables to process in series but much faster. Table sizes in this case might be determined by the number of rows or the number of bytes. Even though a table might have less rows than another table it may contain large objects and bigger datasets than a table with many more rows. So a full understanding of the largest tables by rows and storage might contribute to scaling adjustments that could benefit the overall data migration time.

  1. Failure and recovery

What is the process if the data migration fails at any point prior to completion? Does the data migration need to start over or can it proceed at a resume point once the issue is resolved? The answer to this question might vary based on a data load that has an active source system versus a data load where the source system is not available to users. A proper data mitigation plan should be considered for any unplanned results during the process.

  1. Are changes automatically replicated that occurred during the migration to become consistent (live source only)?

During the data migration of a live source system, changes will continue to occur while the migration is taking place. These changes need to be held back from data replication until the migration is complete and the data is consistent. What if the migration is to take several hours, can the change capture process support holding these changes for that duration? Will the changes automatically begin replicating once the migration is complete and how long will they take to process to make the data sources consistent? 

Hopefully these considerations are areas that have already been discussed or will be as part of your planning for an upcoming data migration. These may not all apply to each and every migration, but having the discussion up-front, while evaluating any products, will assure a swift and successful data migration.