Configure "last one in wins" consolidation strategy in MDM
Use case
I have three publishers:
- PubA
- PubB
- PubC
If the record comes from PubA or PubB, the field from either PubA or PubB should win first. Otherwise the value from PubC should win. If there are records from both PubA and PubB, then the more recent record should win.
I have the following master records that consolidate to one golden record:
Day 1
ID | Field1 | Field2 | |
---|---|---|---|
Master records | PubA.123 | Evans | 1 pm |
PubC.345 | Evans | 2pm | |
Golden records | abc123 | Evans | 1 pm |
Day 2
ID | Field1 | Field2 | |
---|---|---|---|
Master records | PubA.123 | Evans | 1 pm |
PubB.234 | Evans | 1:00 p.m. | |
PubC.345 | Evans | 2pm | |
Golden records | abc123 | Evans | 1:00 p.m. |
Problem
The system field UpdateDate
doesn't always work as a tie breaker for the consolidation strategy. Why?
When a new record is submitted or there is a change to the match conditions, the master record might show a new UpdateDate because this record was "touched" from a system/technical perspective in calculating new match groups. However, this new UpdateDate
could be considered nonsensical from a business user's perspective because there was no material business change to the record. For example, we could possibly get an update to the UpdateDate
for all the master records in the group (they're showing Updated On to Sep 15, 2020) even if 4 out of 5 of the master records did not have any updates to the business values because 1 of the 5 master records was updated or considered to be part of another match group.
To resolve this problem, we need greater control of the UpdateDate field that we rely on for survivorship logic. To achieve this, the solution below explains how you can create a new field that is updated whenever the record is pushed for processing in xDM.
Solution
- Add a new field like
SourceUpdateDate
. Write the consolidation strategy that ranks publishers and uses the SourceUpdateDate as a tie breaker. Add a new SemQL Enricher like
RecordSourceUpdateDate
Enrich into the
SourceUpdateDate
field the current timestamp. In the SemQL editor, useCURRENT_TIMESTAMP()
- Use the
Largest Value
for your field-level consolidation strategy forSourceUpdateDate
- Use the following
Custom Ranking
consolidation rule for your business fields where you care about 'last one in'. Check Skip Nulls. You don't want to take PubA's value for that field if it is null.
case /* consolidation rule ranking based on publishers */ when PublisherID = 'A' or PublisherID = 'B' then 10 when PublisherID = 'C' then 20 else 30 /* good practice to define an else clause */ end ASC, SourceUpdateDate DESC
This case statement includes the else clause which will take care of publishers that are not A, B, or C. That should answer your question about Pub D.
Important Note
If you simply have Pub A, Pub B and Pub C, and you want to want to make sure the "last one in" wins for each publisher and you are OK with using a Preferred Publisher
strategy, then you can just add the expression SourceUpdateDate desc
in the Additional Order By
field as a tiebreaker instead of using this Custom Ranking
SemQL consolidation strategy. It still requires you to go through step 1-4 but you don't need to configure a Custom Ranking
consolidation rule. CAVEAT: This strategy only works if the SourceUpdateDate
field is populated for ALL records. If you have any nulls in SourceUpdateDate
, Oracle by default ranks Nulls first when ordering by descending. So using SourceUpdateDate desc
in the Additional Order By
field will not work. Instead, use a coalesce
with SourceUpdateDate
and CreationDate
: coalesce( SourceUpdateDate, CreationDate ) desc