Boomi Data Hub Migration Best Practices (Part Three)

I’m an integration specialist at Solita Oy, an IT service management company, and a Boomi implementation partner. Our team was given the task of migrating master data from one Boomi Data Hub cloud to another Boomi Data Hub cloud location.

In the first blog of this series, I talked about the background of the project, and things to consider before migration. In the second blog, I covered MDM solution configuration. Now, it’s time to share the lessons we learned and best practices we implemented based on these discoveries.

Boomi Data Hubからデータを取得する際に分かった重要なポイント

データ取得の進め方

Boomi Data Hubからデータを取得するには、Boomi APIを活用します。移行プロジェクトでは、特に以下のクエリの利用が重要となります。

ゴールデンレコードの取得
ゴールデンレコードの履歴取得
ゴールデンレコードのメタデータ取得
バッチ更新のステータス確認

Fetching data from Boomi Data Hub through Query Golden Records works like a paging API, where you receive batches of 200 records and you have to repeat the query with an offset token provided in the previous result. When the offset token is empty, you have fetched the last batch.

If you need the history of each golden record, you will need to query that individually for each record.

データハブ移行のためのデータ変換で分かった重要なポイント

データ形式の整え方

移行時には、データを登録や比較に適した形式へ変換することが重要です。今回のプロジェクトでは、複数のゴールデンレコードとその履歴を扱えるように、配列の中にさらに配列を持つ構造でデータを管理しました。

簡易的な比較であれば、最新の履歴だけを対象にする方法でも十分です。一方で、より正確にデータを比較したい場合は、移行後にソース側と移行先のMDM間で、履歴のすべてのステップやソースとの紐付け情報まで含めて比較する必要があります。
なお、移行プロセスの中で最も時間を要する作業は、データ取得となるケースが多いといえます。そのため、履歴データを取得せず、ゴールデンレコードの現在の状態のみを比較するような軽量な比較ロジックを設計することが、効率的な進め方となります。

データの整理は移行計画の段階から

データの整理・クレンジングは、移行プロジェクトの計画段階から実施すべき重要な取り組みです。重複データや入力ミスなどの不整合があると、移行作業の難易度が大きく高まります。

ただし、初期段階での整理だけでは不十分な場合も多く、移行の進行中にも新たな問題が見つかる可能性があります。そのため、移行期間中も継続的にデータ品質の改善に取り組む前提で進めることが重要です。

また、データ品質に関する判断や対応ができる業務担当者がプロジェクトに関与していると、移行を円滑に進めやすくなります。

Boomi Data Hubからのデータ移行で分かった実践的な進め方

移行プロセスの考え方

ここで紹介するプロセスは、次の2つの状況に対応できるよう設計されています。
・移行先のMDMにまだゴールデンレコードが存在せず、設定のみ完了している場合
・一部の移行が完了しているMDMに対し、移行の完了や同期を行う場合

This enabled us to pause the migration process when we ran into quarantine issues or other errors that paused the process. We were also able to purge any corrupted golden records and re-migrate only those records. This enabled an approach that didn’t require everything to go perfect on the first try, enabled us to compare the data and monitor the synchronization levels when testing the new MDM, and make changes to configurations like source ranking in order to handle issues related to configuration changes that occurred during the history of golden records.

The process we found the best for our case was roughly like this:

1. Store the timestamp for the start of the process.

2. Fetch all the golden records from the source MDM.

3. Fetch all the golden records from the target MDM.

4. Compare these to each other.

1. Check the values of the golden records.

2. Check the update timestamps of golden records.

5. Records that don’t exist in the target MDM are marked to be migrated.

6. Records that have differences either in the update timestamp or field values are to be either updated with the latest data or removed from target MDM and then migrated completely.

7. Gather all the records that need to be migrated.

8. Fetch the record history of these golden records.

9. Arrange your data into batches.

1. Loop through your sources.

2. Take the first history step for the record that has not yet been sent forward on this round.

3. If the contributing source for the history step matches the source you are currently

processing, throw the step into a batch of that source. Remember there are only 200 updates in a batch, so either send full batches or store them to wait for the round to complete and switch to a new one.

4. The generation of the batch might require you to compare the current step in golden records history to the previous one in order to generate an update that contains only the fields that have changed.

5. Remove the golden record from the group of golden records that you are currently looping through. This will ensure that you will send only a single step of each record’s history at a time.

6. Switch to the next source.

10. Send your batches and store the batch ID that you receive in the reply.

11. The update process is asynchronous, and as such you will need to query for the status of records before you can continue. Wait a moment and start querying the status of these batches with the Get Batch Update Status described in the API documentation.

12. If any of the statuses of your batches end up completed with errors or your entries got quarantined, pause the process and handle any issues manually before continuing the process. You can also ignore errors during the process and clean up the records afterwards, but errors might cause duplicates or data corruption which might affect other golden records.

13. Once all the batches have completed for this round, process the next step for all the golden records.

14. End the process with comparison of data to get a feel for how well the migration process went.

レコード全体を段階ごとに進める移行方法

Boomi Data Hubでは、1回のバッチで最大200件のゴールデンレコード更新を受け付けることができます。これは、更新回数やクエリ数を抑えるために活用すべき基本単位です。なお、1つのバッチに含められる更新は、同じデータ提供元である必要があります。

It is more efficient to handle the migration of the golden records by sending the first history step of each record and ensuring they are processed before sending the second step in the records histories.

If you migrate the records one record at a time and want to replay their history one step at a time you might end up with scrambled histories, and batches might not process in the order you send them. On the other hand, if you wait for each batch to finish and migrate depth first, you will end with an extremely long execution time.

移行には余裕を持った計画が重要

The Boomi Data Hub address is in use by multiple customers, and even though the environments are separated the other traffic will have an effect on your migration process. During periods of high traffic your migration batches might take a bit longer to get through or to be processed. Timeouts might occur if there are problems.

Plan your tools in a way where you have the possibility to adapt to such situations. Timeouts need to be retried with adaptive intervals to give the target system time to adjust and process.

特性を理解したうえで移行を進める

Boomi Data Hubへレコードを送信する処理能力は、Data Hub側がレコードを受信・処理できる能力を上回る可能性があります。移行作業では通常運用時よりもMDMへのトラフィックが大幅に増加するため、この点を前提に計画を立てることが重要です。

また、複数のバッチを連続して送信した場合でも、それらが送信した順序どおりに処理されるとは限りません。この特性を考慮せずに移行を進めると、ゴールデンレコードの履歴が正しい順序で再現されない可能性があります。

さらに、終了日が設定されたゴールデンレコード（end-dated golden records：特定の期間のみ有効とされる履歴上のレコード）を移行する場合は、マッチングルールにおける重複判定のリスクが高まります。現在のデータに重複がないことを確認していても、履歴上のレコードに同一のマッチング項目が存在する可能性があり、その結果、移行プロセス中に検疫対象のエンティティが発生することがあります。

バッチ内での履歴データ生成時の注意点

ゴールデンレコードの履歴を取得すると、各更新後の時点におけるすべての項目の値が返されます。つまり、過去にソースシステムがData Hubへどの項目を更新として送信したのかまでは分かりません。履歴上で前後のステップ間に変更がない項目についても、実際には更新に含まれていたのか、単に省略されていたのかを区別できないということです。

多くの場合、この点は大きな問題にはなりません。履歴ステップごとに、その時点のすべての項目をそのまま送信しても支障がないためです。しかし、項目別優先順位（ranked fields：データ提供元ごとに項目単位で優先度を設定し、どのソースの値を採用するかを制御する機能）を有効にしている場合は注意が必要です。

変更の有無を確認せずにすべての項目を送信すると、上位のソースがすべての項目に対して更新を行ったと見なされる可能性があります。その結果、後から下位のソースが更新しようとしても、該当項目を変更できなくなる場合があります。たとえ項目の値自体が変わっていなくても、Data Hubはどのソースがどの項目に関与したかを記録し、優先度の低いソースによる更新を制限する仕組みを持っているためです。

なお、どの項目がどのソースによって更新されたかを判別できる情報は、APIから取得することができません。そのため、ranked fieldsを利用している環境では、単純に履歴データをすべて送信する方法は適していません。

実務上は、履歴の各ステップを前のステップと比較し、実際に変更があった項目のみを抽出して更新データを生成する必要があります。また、移行先のMDMで正しいゴールデンレコードを特定できるよう、マッチングに使用される項目は必ず更新データに含めることが重要です。

旧環境と新環境を一定期間並行運用する

旧マスターデータハブと新マスターデータハブの双方へ同時にレコードを送信する期間を設けることで、移行プロセスおよび連携処理の両方における問題を早期に特定しやすくなります。すべての受信メッセージを両環境へ複製して送信すれば、旧環境と新環境の間でデータの同期状態を維持しやすくなります。

移行作業では想定外のエラーが発生する可能性があるため、このような並行運用期間を設けることは、データの信頼性を高めるうえで有効です。また、この期間を通じて、並行運用がなければ発見が難しい問題の特定にもつながります。

新しいマスターデータハブでデータ受信や更新処理が安定して行われることを確認できた段階で、周辺システムへのデータ更新送信の役割を新環境へ移行できます。ただし、その前にアウトバウンドキュー（outbound queue：外部システム向けに送信待ちとなっている更新データの待機領域）を空にしておくことが重要です。これを行わない場合、連携処理によって外部システムと接続した直後の初回実行で、初期ロードとしてすべてのデータが一斉に送信されてしまう可能性があります。

The wrong way to do this is just emptying the outbound source queues in the Boomi Data Hub management view. Especially for bi-directional sources. This would cause a situation where any further updates to golden records would not generate updates toward such sources, as Data Hub wants to establish a connection to a bi-directional system before it sends further updates to the system. To establish a connection, Data Hub wants the sent data acknowledged and an update sent back with a source linking ID.

The correct way to perform this is to have a dummy integration that reads all the outbound updates for sources and responds back to the hub with the correct source IDs to verify the connection.

移行元と移行先のMDM間でデータ比較を容易に行える仕組みを用意する

移行プロジェクトでは、多くの工程が存在するだけでなく、一定期間は移行元と移行先のMDMを並行運用しながらデータの同期状態を確認する必要があります。

そのため、両システム間のデータ差分を簡単に比較できる仕組みを事前に用意しておくことが重要です。

具体的には、必要なタイミングで比較処理をすぐに実行でき、データの不一致や差分の内容を網羅的に確認できるレポートを取得できるようにツールを設計しておくと効果的です。このような比較機能は、移行の進行状況や品質を把握するうえで非常に重要な役割を果たします。

ソースリンクIDの扱いに注意する

移行を進める前に、ソースリンクIDとしてどのような値を使用しているかを確認することが重要です。たとえば、ゴールデンレコードのIDをソースエンティティIDとして利用しているケースが存在する可能性があります。これは、データ整合性を保つためにBoomi Data Hubからデータを取得し、ゴールデンレコードを更新する連携処理において、識別子として利用されている場合に起こり得ます。

ただし、ゴールデンレコードのIDはレコード作成時に自動生成されるため、移行時に同じIDを引き継ぐことはできません。そのため、移行後は新しいハブで生成されたGolden Record IDに基づいてソースエンティティのリンク情報を更新する必要があります。
この対応のため、移行プロセスの途中で対象レコードを照会し、新しいGolden Record IDを取得する処理が必要になる場合があります。これを前提に設計しておくことで、移行後のデータ整合性を確実に維持できます。

まとめ

MDMの移行は、それ自体が複雑なプロジェクトであり、十分な時間と集中した対応が求められます。事前に十分な準備を行い、適切な移行ツールを選定するとともに、MDMや連携処理の設定を整えておくことが重要です。

ここでご紹介した知見や実務上のポイントが、皆さまの移行プロジェクトの検討や推進に役立てば幸いです。

Boomi Data Hub は、企業内に存在するさまざまなデータサイロ（Data Silo：部門やシステムごとに分断され、共有されにくいデータの状態）の中心に位置する、クラウドネイティブ型のマスターデータ管理ソリューションです。既存の MDM 環境も含めたデータ基盤全体をつなぎ、導入しやすく、拡張性・柔軟性・セキュリティに優れた「サービスとしてのマスターデータ管理ハブ」を実現します。

詳細については関連情報をご確認いただくか、 Boomi 担当者までお問い合わせください。

このシリーズの記事はこちら：

Part One: Things to Consider Before Migration

Part Two: Boomi Data Hub Configuration

‍

Boomi Data Hub（データハブ）移行を成功させる進め方と押さえるべき実務ポイント（パート3）