Clienteling isn’t a new theory. It can be traced back hundreds of years when shopkeepers …
Data migrations are becoming increasingly common with a variety of new platforms for data storage. The cloud provides a way to offload management and storage of data onsite to reduce risk and provide scalability. There are also many new architectures and emerging technologies like non-relational databases that provide better options for data analytics and reporting. As a result, data migrations have emerged in all industries and for companies of all sizes.
However, such data migrations require upfront planning to become successful. In this article, several considerations will be identified to help plan a successful data migration and determine the proper tools to use in the process.
If the source system cannot experience any downtime or outages while the data migration is taking place, it is critical to evaluate products that can accommodate such a requirement. If the data replication tool is not prepared to accept changes and hold them while the migration is taking place, corruption of data might occur on the target.
Some data migrations allow for the source system to be shut down for users so that there is a clear path to extract the data, transfer and load it to the target before allowing users back into the source. This is a clean and straightforward approach if downtime is applicable. However, even if downtime is not possible the duration of the shutdown becomes important. Proper metrics and/or estimates need to be calculated upfront to make sure the full dataset can complete before the outage window expires.
If the target platform contains foreign key relationships, will the data be loaded from the source in a parent-first approach to avoid violations? Oftentimes tables need to be evaluated by the dependencies to determine the order in which tables can be loaded. Some data tools will even disable foreign keys at the target in order to improve performance and apply them back after the data migration has been completed.
Often the performance of a data migration is critical to success, so being able to load the data as quickly as possible is a major benefit. Database vendors will often provide tools or APIs to achieve greater performance. Evaluating tools that can support these vendor-provided options or even provide some alternatives will be a factor in improving the time of the data load. Be careful, as sometimes a bulk approach comes with additional requirements, such as the target tables must be completely empty.
Depending on the number of connections available to the source, often a multi-threaded approach can also be used to increase the performance of the data migration. This would allow multiple tables to be extracted, transferred, and loaded in parallel rather than one after another in series. As in any multi-threaded process, it’s key to understand the capabilities of the machines running these processes so they are not overworked to the point that the multi-threading gains are lost.
Some database tables contain large amounts of historical data that remain unchanged and could be migrated over in phases at a later date so that the active data can be readily available to users and back online faster. For example, if a table has a year tied to it that might allow only the past year or two years only to migrate over initially. Then additional historical data can be moved over at a later point and in smaller, controlled batches. This might reduce the overall time of the data load and allow users back into the system if an outage window is required.
Some data tools may also be able to multi-thread the migration and dedicate larger tables to their own thread allowing many smaller tables to process in series but much faster. Table sizes in this case might be determined by the number of rows or the number of bytes. Even though a table might have less rows than another table it may contain large objects and bigger datasets than a table with many more rows. So a full understanding of the largest tables by rows and storage might contribute to scaling adjustments that could benefit the overall data migration time.
What is the process if the data migration fails at any point prior to completion? Does the data migration need to start over or can it proceed at a resume point once the issue is resolved? The answer to this question might vary based on a data load that has an active source system versus a data load where the source system is not available to users. A proper data mitigation plan should be considered for any unplanned results during the process.
During the data migration of a live source system, changes will continue to occur while the migration is taking place. These changes need to be held back from data replication until the migration is complete and the data is consistent. What if the migration is to take several hours, can the change capture process support holding these changes for that duration? Will the changes automatically begin replicating once the migration is complete and how long will they take to process to make the data sources consistent?
Hopefully these considerations are areas that have already been discussed or will be as part of your planning for an upcoming data migration. These may not all apply to each and every migration, but having the discussion up-front, while evaluating any products, will assure a swift and successful data migration.