Use case
I have three publishers:
- PubA
- PubB
- PubC
If the record comes from PubA or PubB, the field from either PubA or PubB should win first. Otherwise the value from PubC should win. If there are records from both PubA and PubB, then the more recent record should win.
I have the following master records that consolidate to one golden record:
Day 1
ID | Field1 | Field2 | |
---|---|---|---|
Master records | PubA.123 | Evans | 1 pm |
PubC.345 | Evans | 2pm | |
Golden records | abc123 | Evans | 1 pm |
Day 2
ID | Field1 | Field2 | |
---|---|---|---|
Master records | PubA.123 | Evans | 1 pm |
PubB.234 | Evans | 1:00 p.m. | |
PubC.345 | Evans | 2pm | |
Golden records | abc123 | Evans | 1:00 p.m. |
Problem
The system field UpdateDate
doesn't work in this consolidation strategy. When a new record is submitted, the UpdateDate can be nonsensical because the last record in has an UpdateDate
of NULL and the older record has an UpdateDate. For example:
In this case, the record created in September should win. But the UpdateDate
attribute is NULL
so basing the consolidation strategy on UpdateDate
will be bad.
Solution
- Add a new field like
SourceUpdateDate
. Write the consolidation strategy that ranks publishers and uses the SourceUpdateDate as a tie breaker. Add a new SemQL Enricher like
RecordSourceUpdateDate
Enrich into the
SourceUpdateDate
field the current timestamp. In the SemQL editor, useCURRENT_TIMESTAMP()
- Use the
Largest Value
for your field-level consolidation strategy forSourceUpdateDate
- Use the following
Custom Ranking
consolidation rule for your business fields where you care about 'last one in'. Check Skip Nulls. You don't want to take PubA's value for that field if it is null.
case /* consolidation rule ranking based on publishers */ when PublisherID = 'A' or PublisherID = 'B' then 10 when PublisherID = 'C' then 20 else 30 /* good practice to define an else clause */ end ASC, SourceUpdateDate DESC
This case statement includes the else clause which will take care of publishers that are not A, B, or C. That should answer your question about Pub D.
Important Note
If you simply have Pub A, Pub B and Pub C, and you want to want to make sure the "last one in" wins for each publisher and you are OK with using a Preferred Publisher
strategy, then you can just add the expression SourceUpdateDate desc
in the Additional Order By
field as a tiebreaker instead of using this Custom Ranking
SemQL consolidation strategy. It still requires you to go through step 1-4 but you don't need to configure a Custom Ranking
consolidation rule. CAVEAT: This strategy only works if the SourceUpdateDate
field is populated for ALL records. If you have any nulls in SourceUpdateDate
, Oracle by default ranks Nulls first when ordering by descending. So using SourceUpdateDate desc
in the Additional Order By
field will not work. Instead, use a coalesce
with SourceUpdateDate
and CreationDate
: coalesce( SourceUpdateDate, CreationDate ) desc