Register by January 15th, to save $700! Early Early Bird ends in:

Best Practices When Migrating To and From Boomi DataHub (Part Three)

by Boomi
Published Jan 29, 2021

I’m an integration specialist at Solita Oy, an IT service management company, and a Boomi implementation partner. Our team was given the task of migrating master data from one Boomi DataHub cloud to another Boomi DataHub cloud location.

In the first blog of this series, I talked about the background of the project, and things to consider before migration. In the second blog, I covered MDM solution configuration. Now, it’s time to share the lessons we learned and best practices we implemented based on these discoveries.

Best Practices Discovered When Getting Data From Boomi DataHub

Getting the data

Getting data from Boomi DataHub requires you to utilize the Boomi API. The following queries are specifically important during the migration:

  • Query Golden Records
  • Get Golden Record History
  • Get Golden Record’s Metadata
  • Get Batch Update Status

Fetching data from Boomi DataHub through Query Golden Records works like a paging API, where you receive batches of 200 records and you have to repeat the query with an offset token provided in the previous result. When the offset token is empty, you have fetched the last batch.

If you need the history of each golden record, you will need to query that individually for each record.

Best Practices Discovered on Transforming Data for Migration

Data format

You probably want to transform the data into a format more suitable for insertion and comparison. Arrays within arrays was our solution to store multiple golden records and their history for the comparison and insertion.

For light comparison purposes, just the last entry can be enough, for more complete comparison between data you probably want to compare all the history steps and the source linking data between the source and target MDMs after your migration. The biggest time cost in your process will most likely be the fetching of data, so a light comparison is good to build in a way that it doesn’t fetch the history, just the current status of golden records, and compares these.

Clean data is easier to use

Data cleanup should be part of the planning phase of the migration process. Duplicates and data errors complicate the migration process quite a lot. But be prepared to continue the data cleanup during the migration, as you are more than likely to find errors that were missed during the initial cleanup. A business contact that knows how to handle data quality issues will be a great help during the process.

Best Practices Found When Migrating Data From Boomi DataHub

The process

The process described here was intended to work on two different scenarios. One where the target MDM is still empty of golden records, but with proper configurations in place. Secondly, on a partially migrated MDM situation to complete the migration and synchronize the golden records.

This enabled us to pause the migration process when we ran into quarantine issues or other errors that paused the process. We were also able to purge any corrupted golden records and re-migrate only those records. This enabled an approach that didn’t require everything to go perfect on the first try, enabled us to compare the data and monitor the synchronization levels when testing the new MDM, and make changes to configurations like source ranking in order to handle issues related to configuration changes that occurred during the history of golden records.

The process we found the best for our case was roughly like this:

1. Store the timestamp for the start of the process.

2. Fetch all the golden records from the source MDM.

3. Fetch all the golden records from the target MDM.

4. Compare these to each other.

1. Check the values of the golden records.

2. Check the update timestamps of golden records.

5. Records that don’t exist in the target MDM are marked to be migrated.

6. Records that have differences either in the update timestamp or field values are to be either updated with the latest data or removed from target MDM and then migrated completely.

7. Gather all the records that need to be migrated.

8. Fetch the record history of these golden records.

9. Arrange your data into batches.

1. Loop through your sources.

2. Take the first history step for the record that has not yet been sent forward on this round.

3. If the contributing source for the history step matches the source you are currently

processing, throw the step into a batch of that source. Remember there are only 200 updates in a batch, so either send full batches or store them to wait for the round to complete and switch to a new one.

4. The generation of the batch might require you to compare the current step in golden records history to the previous one in order to generate an update that contains only the fields that have changed.

5. Remove the golden record from the group of golden records that you are currently looping through. This will ensure that you will send only a single step of each record’s history at a time.

6. Switch to the next source.

10. Send your batches and store the batch ID that you receive in the reply.

11. The update process is asynchronous, and as such you will need to query for the status of records before you can continue. Wait a moment and start querying the status of these batches with the Get Batch Update Status described in the API documentation.

12. If any of the statuses of your batches end up completed with errors or your entries got quarantined, pause the process and handle any issues manually before continuing the process. You can also ignore errors during the process and clean up the records afterwards, but errors might cause duplicates or data corruption which might affect other golden records.

13. Once all the batches have completed for this round, process the next step for all the golden records.

14. End the process with comparison of data to get a feel for how well the migration process went.

Breadth first

Boomi DataHub is able to accept batches of up to 200 golden record updates. This is the basic building block that you want to utilize to minimize the number of updates and queries. All the updates in a batch must have the same contributing source. If you also wish to migrate the history of golden records, then you should take these into account.

It is more efficient to handle the migration of the golden records by sending the first history step of each record and ensuring they are processed before sending the second step in the records histories.

If you migrate the records one record at a time and want to replay their history one step at a time you might end up with scrambled histories, and batches might not process in the order you send them. On the other hand, if you wait for each batch to finish and migrate depth first, you will end with an extremely long execution time.

Patience is a virtue

The Boomi DataHub address is in use by multiple customers, and even though the environments are separated the other traffic will have an effect on your migration process. During periods of high traffic your migration batches might take a bit longer to get through or to be processed. Timeouts might occur if there are problems.

Plan your tools in a way where you have the possibility to adapt to such situations. Timeouts need to be retried with adaptive intervals to give the target system time to adjust and process.

Understand the quirks

Your ability to send records to Boomi DataHub is going to be greater than DataHub’s ability to handle and process records. The migration will place a lot greater traffic on your MDM than the normal operation.

When you send multiple batches one after the another, the processing of these batches might not happen in the same order that you sent them in! This is important to take into account and plan accordingly. Otherwise you might end up with records that have their history scrambled.

If you are sending end-dated golden records the chance of duplicates in matching rules increases. Even if you ensured your current data doesn’t have duplicates, there might exist duplicates on the matching fields in the end-dated records and you’ll end up with quarantined entities on your migration process.

The creation of history steps in the batch

When you query the history of a golden record, you will receive the actual value of each field of the golden record after each update. This means that you don’t know what was the actual update the source sent towards DataHub in the past. Were the fields that didn’t change between steps in history included in the update or not? If all the records fields were included but the values were the same as in the previous step in history, it would look exactly the same in the Golden Record History Reply as it would if the fields that didn’t change were omitted from the update.

Normally this is not a problem. You could just send each step as it is, even the fields the source surely didn’t change. The problem arises if you have the ranked fields feature turned on.

If you just send all the fields without caring about if the source contributed to all the fields or not, you might end up in a situation where a future update from a lower ranked field cannot change the golden record anymore, as a previous update by a higher-ranking source claimed all the fields. Even if the golden records field didn’t change in value, DataHub remembers which sources have contributed to which fields and prevents lower ranking sources from updating a field that a higher-ranking source has contributed to.

Unfortunately, we were not able to find the data through the API to tell us which fields are contributed by which source.

With ranked fields you cannot just send everything and hope for the best. You will need to compare the step in the golden record’s history to the previous one only to pick up those fields that have changed. Also, you need to of course include any matching fields to ensure your update finds the correct golden record in the target MDM.

Have a period of overlap between the old and new master data hub

A time period where you send records to both the old and new master data hub will give you an excellent opportunity to identify errors both in your migration process as well as in your integration processes. If all the incoming messages are duplicated to both hubs, the data should stay in sync between the old and new.

As the migration process is prone to unforeseen errors this will bring a lot of reliability to your data and helps you identify errors that would be very hard to identify without it.

When you are happy with the way data is accepted by the new master data hub you can switch the responsibility of sending data updates to surrounding systems to your new hub. Before you do this, ensure that you empty your outbound queues as otherwise the moment you connect outbound systems via integration processes, the first run would send all your data to all the systems to finish the initial load. If your systems are such that unnecessary updates might cause issues, it is better to clean out these queues.

The wrong way to do this is just emptying the outbound source queues in the Boomi DataHub management view. Especially for bi-directional sources. This would cause a situation where any further updates to golden records would not generate updates toward such sources, as DataHub wants to establish a connection to a bi-directional system before it sends further updates to the system. To establish a connection, DataHub wants the sent data acknowledged and an update sent back with a source linking ID.

The correct way to perform this is to have a dummy integration that reads all the outbound updates for sources and responds back to the hub with the correct source IDs to verify the connection.

Have an easy way to compare the data between source and target MDM

There are numerous steps in the migration and the overlap period when you want to ensure that your data is in sync between MDMs. Plan your tooling in a way where you can easily launch such comparisons and get a comprehensive report on the differences between data. This will be your best friend during the process.

Source link IDs

Check what kind of values you have used as your source link IDs. There might be cases when you have used for example the Golden Record ID of a golden record as a source entity ID. This might be the case with integration processes that query Boomi DataHub for data to ensure data integrity and update golden records and need an ID to utilize.

When you migrate a record, you cannot keep the Golden Record ID as this is generated when the golden record is created. You need to keep this in mind and modify any source entity links to utilize the new Golden Record ID of the migrated golden record in the new hub. This means that you might have to query the record in the middle of the migration process to find the Golden Record ID.

Summary

Migration of MDM is a complicated project in itself and will require time and concentration. Make sure you are well prepared, have chosen your migration tool wisely and have configured the MDM and the integration processes. I hope these insights and tips help you with your own migration project.

Boomi DataHub is a cloud-native master data management (MDM) solution that sits at the center of the various data silos within your business – including your existing MDM solution, to provide you an easy to implement, scalable, flexible, and secure master data management hub as a service. For more information, go here or contact a Boomi expert.

Read the previous posts in this series here:

     Part One: Things to Consider Before Migration

     Part Two: Boomi DataHub Configuration

On this page

On this page

Stay in touch with Boomi

Get the latest insights, news, and product updates directly to your inbox.

Subscribe now