Semarchy xDM semql reference guide

This article helps you find some useful content using the knowledge base full search. It is a copy of the official's guide.

Please go to the official Semarchy xDM documentation for more self-investigation on your problem.

Welcome to Semarchy xDM.
This guide contains information about the SemQL language which is used across Semarchy xDM.

PREFACE

Audience

This document is intended for MDM Developers who want to use SemQL and Semarchy xDM for their Enterprise Master Data Management Initiatives. It is also intended for Data Stewards who want to fully use SemQL to manage master data in a hub.

If you want to learn about MDM or discover Semarchy xDM, you can watch our tutorials.

The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Overview

Using this guide, you will learn the SemQL language and its usage in Semarchy xDM.

INTRODUCTION TO SEMQL

SemQL is a language to express declarative rules in Semarchy xDM. It is used for example to define:

Enrichers, Matchers, Validations and Survivorship Rules involved in the Certification Process.
Composite and transformed attributes in form and collections appearing in MDM Application.
Filters when browsing data in the MDM Hub.

It has the following main characteristics:

The syntax is close to the SQL language and most SemQL functions map to Database functions.
SemQL is converted on the fly and executed by the hub database.
SemQL uses Qualified Attribute Names instead of columns names. The code remains implementation-independent.

The following sections describe the main characteristics of the SemQL language.

SQL-Like Clauses

The SemQL Language allows users to define SQL-Like clauses. Depending on the context, these clauses may be one of the following:

Condition: A clause that returns a boolean result from the evaluation of expressions using operators. A condition can be used for example for filtering or validating data records (if the clause is false, then the record is filtered out or considered invalid).
Expression: A clause that returns a value. In the context of a SemQL Enricher for example, an expression transforms, standardizes and enriches source attributes.
Order By Clause: An expression used to sort records. In consolidation rules, such a clause is used to manage the consolidation conflicts. For example, consider a consolidation made by Most Frequent Value. When multiple values occur with equal frequency, then the SemQL in the Ranking Expression determines which value is used.

Functions differ from comparison operators as they return a non-boolean value. They cannot be used as is in conditions unless used with a comparison operator. For example, TO_CHAR(IsValidCustomer) is a valid expression, but not a valid condition. TO_CHAR(IsValidCustomer)='1' is a valid condition.

SemQL is used to create expressions, conditions and order by clauses. SELECT, UPDATE or INSERT queries are not supported, as well as joins, sub-queries, aggregates, in-line views and set operators.

Users proficient with SQL should not be mistaken by the appearances. Even if SemQL looks like SQL, the SemQL expressions are entirely parsed and rewritten by the Semarchy xDM platform into SQL before their execution by the hub database. A simple SemQL expression may result into a complex SQL statement. As a consequence, it is not recommended to try injecting SQL statements within SemQL expressions.

Qualified Attribute Names

SemQL clauses manipulate attributes and variables defined in the Semarchy xDM model. Attributes are accessed through an unambiguous Qualified Attribute Name. The Qualified Attribute Name is the path to an attribute from the entity being processed.

Built-in and Customized Functions

In expressions, conditions and order by clauses, it is possible to use the built-in SemQL functions. Most of these functions are functions built in the hub’s database and processed by it. Other functions (for example, matching functions) are specific to Semarchy xDM.

You can also use in SemQL customized functions implemented in the database. You must declare these functions in the model to have them appear in the list of functions. See the Declaring Database Functions and Procedures section in the Semarchy xDM Developer’s Guide for more information about declaring customized functions.

Functions that are not declared can still be used in SemQL, but will not be recognized by the SemQL parser and will cause validation warnings.

The SemQL Editor lists all built-in SemQL functions, plus the customized functions that are declared in the model, with their syntax.

SEMQL SYNTAX

Language Elements

The SemQL syntax supports the equivalent of SQL Expressions, Conditions or Order By Clause, which are a combination of one or more Values, Operators, and Functions.

Values, Operators and Functions

Values, operators and functions are the tokens in the SemQL language.

Values are simple expressions. They may be literals, attributes or variables.
Operators modify or compare expressions. SemQL support most SQL operators, including arithmetic and character operators (+,-,*,/, ||), comparison operators (=, !=, >, >=, <, ⇐, IN, BETWEEN, LIKE, REGEXP_LIKE, IS NULL) and logical operators (AND, OR, NOT).
Functions & Expression Constructs combine other tokens to create new expressions. They include most functions available in the hub’s database, plus the functions implemented by the user.

Operators and Functions are not case-sensitive. Values are case-sensitive.
For example:

StartYear BETWEEN 2012 and 2014 is equivalent to StartYear Between 2012 AND 2014
UPPER( CustomerName ) is equivalent to Upper( CustomerName )
FirstName LIKE 'Unknown%' is NOT equivalent to FirstName LIKE 'UNKNOWN%'

Expressions, Conditions, Order By Clause

Expressions, Conditions and Order By Clauses are the phrases supported by the SemQL Language.

Expressions combine values, operators and functions to return a non-boolean value.
Conditions combine values, operators and functions to return a boolean value (true or false).
Order By Clauses are expressions used to sort data by ascending or descending values of the expression.
The ASC or DESC post-fix define the sort order. Default sort order is ascending.
Order by clauses in the consolidation rules' Ranking Expression also support the NULLS FIRST or NULLS LAST clause to specifies whether NULL values should be ordered before or after non-NULL values. By default, the order by clause uses NULLS LAST if the sort is ASC and NULLS FIRST if the sort is DESC.

Examples of Expressions

FirstName is an attribute.
'USA' is a string literal.
Initcap(FirstName) is a function using an attribute.
Initcap(FirstName) || ' ' || Initcap(LastName) is a combination of operators, functions, attributes and literals.

Example of Conditions

1=1 compares two literals.
Country='USA' compares an attribute and a literal.
Upper(Country) in ('USA', 'US', 'CANADA') uses a function.

Example of Order By Clauses

Country sorts by the Country attribute (ascending by default)
Country DESC sorts by the Country attribute (descending). Nulls values are sorted (default behavior for DESC) before all non-null values.
Country DESC NULLS LAST sorts by the Country attribute (descending). Nulls values are sorted after all non-null values.
CASE PublisherID WHEN 'MKT' THEN 1 WHEN 'CRM' THEN 2 ELSE 99 END ASC
sorts records where PublisherID equals MKT, then CRM, then the rest.

The following sections detail the elements of a SemQL clause.

Comments

Comments in SemQL are surrounded by /* and */.

An example of code with comments is provided below.

ANY Contacts HAVE (1=1) /* customer with contacts */
AND
NOT ANY Contacts HAVE ( IsInfluencer = 1 ) /* and no contact is an influencer */

Values

The values in a SemQL expression may be literals, attributes or model variables:

literals are constant values. Numeric are provided as is, other literals must be surrounded by single quotes '.

Examples:

'John'
-42
'1994-11-07'

Attributes refer to attributes of the entities in the model.
Model Variables contain values that are used to customize the user experience or parameterize an integration job. Variable values are local to the user session or executed job. They are set either via a job parameter (for jobs), or retrieved from remote servers (declared as Variable Value Providers) when the user opens his session. Variables can be used in SemQL filters and expressions created at design and run-time.
Search Parameters store the values entered in search form and submitted to the search condition attached to the search form. Search parameters are available only in their own search form’s condition.

For more information about Attributes and Model Variables, refer to the Attributes and Variables/Search Parameters section.

Operators

Operators are used to:

Combine expressions to create new expressions (Arithmetic or Character Operators).
Evaluate expressions to return a boolean value. Such operators are used to create conditions.

This section details the operators supported in SemQL.

Arithmetic Operators

Operator Description

+

Addition

-

Subtraction

*

Multiplication

/

Division

Character Operators

The || (double pipe) is used for string concatenation.

Comparison Operators

Operator Description

=

Equality

!=, <>

Inequality

>, >=

Greater than, greater than or equal

<, <=

Smaller than, smaller than or equal

IN (value_1, …, value_n)

Compares a value with each value in the list, returns true if one value matches.

BETWEEN value_1 and value_2

Greater than or equal to value_1 and less than or equal to value_2

LIKE pattern

TRUE if value matches the pattern. Within the pattern, the character % matches any string of zero or more characters except null. The character _ matches any single character

REGEXP_LIKE(string, pattern, parameter)

returns true if the string matches the regular expression pattern.
The match parameter may contain one of more of the following options:

i: case-insensitive match
c case sensitive match
n allows the period (.) to match the `newline character' instead of `any character'
m treats the source string as a multiple lines input.

REGEXP_LIKE has a boolean result and is considered a condition and not a function.

IS [NOT] NULL

Tests for nulls

any child_entity_role have ( <condition_on_child_entity> )

Condition that returns true if any of child records - in a one to many relationship - meet the given condition. For more information, see Using Related Entities’ Attributes.

all <child_entity_role> have ( <condition_on_child_entity> )

Condition that returns true if all of child records - in a one to many relationship - meet the given condition. For more information, see Using Related Entities’ Attributes.

Logical Operators

Operator Description

AND

Return true if both conditions are true.

OR

Return true if one condition of the other is true.

NOT

Returns true if the following condition is false.

Functions & Expression Constructs

SemQL support functions and expression constructs, that is elements that return a value.

Functions differ from comparison operators as they return a non-boolean value. They cannot be used as is in conditions unless used with a comparison operator. For example, TO_CHAR(IsValidCustomer) is a valid expression, but not a valid condition. TO_CHAR(IsValidCustomer)='1' is a valid condition.

Built-in Functions

The functions available in Semarchy xDM include functions in the following categories:

Strings
Comparison
Conversion
Date & Time
Matching
Miscellaneous
Null Management
Numeric

Useful & Noteworthy Functions

The following list contains noteworthy functions and expressions:

TO_CHAR, TO_DATE, TO_NUMBER functions to perform conversion across data types.
TRIM, LTRIM, RTRIM, PAD LPAD, RPAD to trip or pad with blanks.
SUBSTR to retrieve a part of a string.
REPLACE, REGEXP_REPLACE to replace part of a strings.
INSTR to find the location of a substring in a string.
NULLIF, COALESCE and NVL to handle null values.
GREATEST and LEAST to return the greatest and least of a list of expressions.
SYSDATE to retrieve the system date.

The complete set of built-in functions with their description is available in Appendix A

Functions for Matching

Certain functions are key in a fuzzy matching process.

Functions for normalizing of transforming values to reduce the noise during fuzzy matching:

UPPER, LOWER and INITCAP absorb the case-sensitivity differences in strings.
SOUNDEX returns phonetic representations of strings, absorbing typos.
SEM_NORMALIZE returns a string with non-ASCII characters transformed to ASCII-equivalent or a blank.

Functions that implement fuzzy matching capabilities:

SEM_EDIT_DISTANCE and SEM_EDIT_DISTANCE_SIMILARITY respectively returns the distance and percentage of similarity between two strings according to the Levenshtein distance algorithm.
SEM_JARO_WINKLER and SEM_JARO_WINKLER_SIMILARITY respectively return the distance and percentage of similarity between two strings according to the Jaro-Winkler distance algorithm.
SEM_SEM_NGRAMS_SIMILARITY returns the percentage of similarity of two strings according to the Dice’s coefficient similarity measure applied to the n-grams of the strings.

The *_SIMILARITY functions return a value between 0 (no match) and 100 (perfect match). If one or both strings are null, the returned value is 0.

Other Constructs

CASE Expression

The CASE expression selects a result from one or more alternatives, and returns this result.

This syntax returns the first result for which the expression matches the selector. If none match, it returns the default result.

CASE selector
    WHEN expression_1 THEN result_1
    ...
    WHEN expression_n THEN result_n
    [ELSE default_result]
END

This syntax returns the first result for which the condition is true. If none is true, it returns the default result.

CASE
    WHEN condition_1 THEN result_1
    ...
    WHEN condition_n THEN result_n
    [ELSE default_result]
END

The following example from an Enricher transforms the CustomerName attribute according to the Publisher of the record.

CASE PublisherID
    WHEN 'CRM' THEN Upper(CustomerName)
    WHEN 'MKT' THEN Upper(Replace(CustomerName, '-', ' '))
    ELSE CustomerName
END

The same example with the second syntax:

CASE
    WHEN PublisherID='CRM' THEN Upper(CustomerName)
    WHEN PublisherID='MKT' THEN Upper(Replace(CustomerName, '-', ' '))
    ELSE CustomerName
END

Table Functions

SemQL supports searching for an expression’s value in the values returned by a table function, using the following syntax:

expression IN table_function(parameter_1, parameter_2 ...)

For example the following condition uses a table function named SEARCH_FOR_IDS that returns a list of IDs from a customer name, and checks whether the customer ID is in that list.

CUSTOMER_ID in SEARCH_FOR_IDS(CUSTOMER_NAME)

Customized Functions

SemQL allows you to access database functions implemented in the database instance hosting the hub.

Call these functions as regular functions by prefixing them with their schema and (optionnally) their package name: <schema>.<package>.<function>.

For example, to call a CUSTFUNC() function, stored in a CUST001 package, in a COMMON_TOOLS schema, the syntax is:

    COMMON_TOOLS.CUST001.CUSTFUNC(CustomerName)

The database user of the schema hosting the hub must have sufficient privileges to execute the customized functions.

Database functions process data with the database engine. For certain processing involving for example algorithms, libraries or services not easily implemented with the database capabilities, it is preferable to opt for the plugin option. See the Semarchy xDM Plug-in Development Guide for more information.

ATTRIBUTES AND VARIABLES/SEARCH PARAMETERS

Variables

Variables store values that are integration job-specific or user session-specific. Variable are either Built-in Platform Variables, or Model Variables.

Built-in Platform Variables

Built-in Platform Variables are built-in the platform. They contain information about the load or batch being processed or about the user connected or performing the operations. They are listed below:

Variable Name Definition

V_USERNAME

For the certification process, name of the user who has submitted the integration job. For a user data session, the name of the connected user.

V_USER_ROLES

Comma-separated list of roles of the connected user. Note that this variable only returns user roles that have corresponding roles declared in Semarchy xDM (in the Administration view, under the Roles node.). For the Tomcat application server specifically, all roles are returned, including those with no corresponding role declared.

V_LOADID

ID of the external load that has submitted the integration job. This variable is not available in expressions used outside the certification process, such as enrichers or validations triggered in steppers.

V_BATCHID

ID of the batch running the integration job. This variable is not available in expressions used outside the certification process, such as enrichers or validations triggered in steppers.

V_USER_ADDRESS

User Address, configured by the user in his profile.

V_USER_CITY

User City, configured by the user in his profile.

V_USER_COUNTRY

User Country, configured by the user in his profile.

V_USER_DECIMAL_SEPARATOR

User Decimal Separator, configured by the user in his profile.

V_USER_DEPARTEMENT

User Departement, configured by the user in his profile.

V_USER_EMAIL

User Email, configured by the user in his profile.

V_USER_FIRSTNAME

User First Name, configured by the user in his profile.

V_USER_JOB_TITLE

User Job Title, configured by the user in his profile.

V_USER_LANGUAGE

User Language Tag, configured by the user in his profile.

V_USER_LASTNAME

User Last Name, configured by the user in his profile.

V_USER_POSTAL_CODE

User Postal Code, configured by the user in his profile.

V_USER_PRIMARY_PHONE

User Primary Phone, configured by the user in his profile.

V_USER_SECONDARY_PHONE

User Secondary Phone, configured by the user in his profile.

V_USER_THOUSAND_SEPARATOR

User Thousand Separator, configured by the user in his profile.

V_USER_TIMEZONE

User Timezone, configured by the user in his profile.

Model Variables

Model Variables can be used:

In user sessions: They are set when the user accesses an application, using a variable value provider. In this context, variables are used to parameterize the user experience (for example, in filters restricting the user privileges).
In integration jobs: In an integration job, a variable value is usually set using a job parameter. If no job parameter is set, the value is set using the variable value provider. In this context, variables are used to parameterize the job execution (for example, in an enricher’s filter expression to prevent the enricher from processing any record depending on the value).

Model variables set from a Variable Value Providers are refreshed when the user accesses Semarchy xDM. In the context of a job, the variables are refreshed as if the user starting the job had accessed Semarchy xDM.

Using the built-in platform variables V_USERNAME, it is possible to query (via a Variable Value Provider definition) the corporate LDAP directory and retrieve the email of the connected user, and then store this value in a model variable called CORPORATE_USER_EMAIL.

For more information about Model Variables, see the Model Variables section in the Logical Modeling chapter of the Semarchy xDM Developer’s Guide. For more information about Variable Value Providers, see the Configuring Variable Value Providers in the Semarchy xDM Administration Guide.

Using Variables

Variables are used with the following syntax: :<variable_name>, for example: :CORPORATE_USER_EMAIL.

Search Parameters

Search Parameters store the values entered into a search form and submitted to the search condition attached to that search form. They are available only for their own search form’s condition.

For more information about Search Parameters, see the Creating Search Forms section in the Semarchy xDM Developer’s Guide.

Using Search Parameters

Parameters are used using their defined Binding, using the following syntax: :<binding_name>, for example: :SEARCHED_NAME.

When editing the SemQL condition of a search form, the available search parameters are listed in the Variablessection of the Expression Editor.

Named Queries Parameters

Query Parameters store the values passed to a request made to a named query. It is used via its binding name, similarly to a Search parameter.

For more information about Query Parameters, see the Creating Named Queries section in the Semarchy xDM Developer’s Guide.

Attribute Qualified Names

An Attribute Qualified Name is the path to an attribute from the current entity being processed. This path not only allows accessing the attributes of the entity. It also allows access to:

The attributes of the entities related to the current entity (parent and child entities).
The Lineage. For example, to access the attributes of the golden record a master record relates to.

In this section, Name always refers to the (internal) Name of an attribute, and not to the Label. The label may be translated in various languages, but the name is invariant.

Attribute names are case sensitive. For example customerName and CustomERName do not represent the same attribute.

Using Current Entity’s Attributes

A given SemQL clause is expressed for a given entity. For example, an enricher, a validation or a filter is performed on a pre-defined entity. Such a clause always has access to the attributes of this entity. These attributes may be simple or complex attributes.

Simple Attributes

Simple attributes can be accessed using their Name.

FirstName returns the value of the FirstName simple attribute of the current entity (Employee).

Complex Attributes

Display Name

Complex attributes can be accessed using their Attribute Name. This returns the value of the complex attribute in the format of the corresponding complex type’s Display Name.

The SimpleAddressType complex type is defined with a display type that shows the Address, City and Country definition attributes separated by a space. This type is used for the InputAddress attribute of the Customer entity. The InputAddress qualified name therefore returns a string containing <Address> <City> <Country> value for each Customer.

Definition Attribute

It is also possible to retrieve the value of each definition attribute of a complex type by prefixing this definition attribute name by the name of the complex attribute.

The SimpleAddressType complex type includes the Country definition attribute. This type is used for the InputAddress attribute of the Custome entity. The InputAddress.Country qualified name therefore returns the Country stored in the InputAddresscomplex type value for each Customer.

ID Attributes

SemQL exposes several ID attributes to identify a record at certain phases in the certification process.

For example:

A customer golden record is identified by the value in the CustomerID attribute.
Each master record consolidated into this golden is identified by the PublisherID, SourceID pair.

The table below lists the attributes representing a record ID and the value they take depending on the data access view. In this table, <IDAttribute> refers to the name of the ID attribute of the entity.

ID Attribute Label Description

Gold_<IDAttribute>
Example: Gold_CustomerID

Golden ID

The ID of the golden record or ID of the golden record related to the current record if any. This attribute is not available for basic entities.

For fuzzy matched entities, this attribute is completed by the following attributes that track changes due to duplicates management operations:

OldGold_<IDAttribute>: The ID of the previous golden record to which a master record was attached to.
ConfirmedGold_<IDAttribute>: The ID of the confirmed golden record to which a master record is attached to.
OriginalGold_<IDAttribute>: In a duplicate manager operation, the ID of the original golden record to which a master record was attached to.
OriginalConfirmedGold_<IDAttribute>: In a duplicate manager operation, the ID of the original confirmed golden record to which a master record was attached to.

SourceID

Record Source ID

The ID of the source record. This attribute is available only for views exposing master or source records, for fuzzy matched entities.

PublisherID

Record Publisher

The Code of the publisher of the source record. This attribute is available only for views exposing master or source records for fuzzy or ID matched matched.

<IDAttribute>
Example: CustomerID

Record ID

The ID of the current record. Its value depends on the view:

For views exposing golden records and for basic entities (GD, GDWE, GE, GH, GH4B, GI, GX, UG) it exposes the same value as Gold_<IDAttribute>.
For views exposing, in matched entities, master or source records (AE, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SE), it exposes a string containing PublisherID.SourceID.

Foreign Attributes

SemQL exposes multiple attributes representing a related (parent) record ID at certain phases of the certification process.

The table below lists the various attributes representing a related record ID the value they take depending on the data access view. In this table, <ForeignAttribute> refers to the name of the foreign attribute in the reference.

ID Attribute Label Description

SourceID_<ForeignAttribute>

Referenced Record Source ID

The ID of the referenced record. This attribute is available when referring to fuzzy matched entities.

PublisherID_<ForeignAttribute>

Referenced Record Publisher

The code of the publisher of referenced record. This attribute is available when referring to fuzzy matched entities.

FID_<ForeignAttribute>
Example: FID_CustomerID

Referenced Record ID

The ID of the referenced record. Its value depends on the referenced record type: When referencing a golden record or a basic entity record it exposes the ID of the referenced golden record.

In forms, an attribute called FDN_<ForeignAttribute> returns a pointer to the referenced record. Use this attribute preferably to display the reference in user-friendly format.

See also Using Related Entities’ Attributes for more information about using the attribute of the related entities.

Built-in Attributes

Built-in attributes are provided by Semarchy xDM to support the certification process. They appear in addition to the attributes designed in the data model.

Data Access Views

Data is accessed or manipulated using SemQL via a data access view.

A data access view represents the data at a particular phase of its lifecycle in the MDM hub. For example, Golden Data, Master Data, Source Error, etc. Views are available depending on the type of the entity. For example, the Master Data view does not exist for a basic entity.

Each view is identified by an alias (GD for Golden Data, SE for source error, etc). This alias is used in the REST API to query this view, and as a prefix for the physical table containing this view’s data, if such a table exists.

The complete list of data access views with their description is available in Appendix B

Available attributes depend on the data access view running the SemQL expression and on the type of entity (basic, ID of fuzzy matched).

For example, a clause that involves the Source Data view (for example, an enricher on source data) will support built-in attributes such as the SourceID (ID of the source record) or the PublisherID (Code of the application that published the source record). On Golden Records - which are not directly related to one source - these built-in attributes no longer make sense.

Built-In Attributes List

Built-in attributes are available depending on the type of entity (basic, ID or fuzzy matched), and view being used.

The complete list of built-in attributes with their description and views is available in Appendix B

The SemQL Editor automatically lists the attributes available depending on the situation. Use this editor to make sure to use only the correct attributes.

Using Related Entities’ Attributes

Related entities may be either parent entities (for a given relation, the current entity has zero or one parent), or child entities (for a given relation, the current entity has zero or more children).

Parent Entities

It is possible to access attributes of a parent entity by prefixing this attribute by the Role of this parent in the relation.

The current Customer entity references the Employee entity in a relation. In this relation, the Role of this second entity is AccountManager. The AccountManager.FirstName attribute refers to the FirstName of the Employee that is the parent - in this relation, the AccountManager - of the current Customer.

Referring to parent entities can follow a chain of relations.

AccountManager.CostCenter.CostCenterName follows two relations to return the CostCenterName of the CostCenter to which the AcountManager of the current Customer reports to.

Child Entities

Accessing child entities is possible in conditions only using the SemQL any and all syntax.

Any and All Syntax

The any syntax is a condition that returns true if any of the child records meet the given condition.

any <child_entity_role> have ( <condition_on_child_entity> )

or

any <child_entity_role> has ( <condition_on_child_entity> )

To filter Customers having at least one Contact named John

any Contacts have ( FirstName = 'John' )

The all syntax is a condition that returns true if all the child records meet the given condition.

all <child_entity_role> have ( <condition_on_child_entity> )

or

all <child_entity_role> has ( <condition_on_child_entity> )

To filter Customers having all their Contacts with the IsInfluencer flag set to '0' :

all Contacts have ( isInfluencer = '0' )

Cascading References

It is possible to cascade through several relations’ roles.

To filter Employees managing Customers having one contact with the IsInfluencer flag set to 1:

any Customers.Contacts have ( IsInfluencer = '1' )

ParentRecord

In the Any and All syntax, it is possible to access the direct parent’s record from the condition on the child entity, through the ParentRecord reserved keyword or through the parent’s role name in the relation.

The following condition returns all the customers having two contacts with a different ContactID but the same FirstName.

 any Contacts have (
 	any ParentRecord.Contacts have (
 		ParentRecord.ContactID != ContactID and ParentRecord.FirstName = FirstName
 	)
)

Using the Lineage

You can navigate records lineage using SemQL. Using this navigation, you can for example access the master and the golden records consolidated from a source record, or you can access all the attached master records from a golden record.

To understand the lineage structures, refer to the Data Certification section in the Semarchy xDM Integration Guide.

Lineage Parent Records

You can access the attributes of a parent record related to your current record in the lineage.

This navigation is possible using a pseudo-role name representing the parent relation in the lineage, such as GoldenRecord (of a master record for example), MasterRecord or SourceRecord.

To filter out the master records that are singletons, you can access the golden record using the GoldenRecord pseudo-role and then use its number of masters MastersCount:

GoldenRecord.MastersCount > 1

To get the CustomerName consolidated in the golden record that results from a given source record:

MasterRecord.GoldenRecord.CustomerName

The complete list of built-in lineage navigation is available in Appendix B

The SemQL Editor automatically lists the lineage navigation available depending on the situation.

Lineage Child Records

You can access the attributes of child records related to your current record in the lineage.

This navigation is possible using a pseudo-role name representing the child records in the lineage relation, such as MasterRecords(the master records attached to a golden record) or SourceRecords (the source records attached to a master record).

You can use the lineage child records similarly to the Child Entities records, using the SemQL any and all syntax.

To filter golden records that have duplicates only from the CRM publisher:

MastersCount > 1
and
all MasterRecords have ( PublisherID = 'CRM' )

To filter all master records created from source records older than 5 days.

all SourceRecords have (CreationDate < SYSDATE() - 5)

The complete list of built-in lineage navigation is available in Appendix B

The SemQL Editor automatically lists the lineage navigation available depending on the situation.

Special Cases

Attributes of Duplicates

Certain SemQL expressions manipulate two similar records simultaneously:

The SemQL condition that defines the match rule in a Matcher.
The SemQL condition used to filter duplicates.

For these expressions, the two similar records are identified by the RECORD1 and RECORD2 pseudo-record prefixes.

The following condition returns the duplicates with the same InputAddress.Address (complex type) but a different CustomerName

Record1.CustomerName <> Record2.CustomerName
and Record1.InputAddress.Address = Record2.InputAddress.Address

Attributes in Reference Pickers Filters

In Steppers, SemQL conditions can filter the records selectable with reference pickers. These conditions usually manipulate two different records.

In this context, the records are identified using pseudo-record prefixes:

Record represents the record being edited. It is the referencing record.
Referenced represents the selectable referenced record (the one filtered in the reference picker).

The following condition is set on a reference picker that enables selecting an Account Manager (Employee) from a Customerentity managed in a stepper. It reduces the selectable employee records to only those in the same country as the Customer being edited.

Record.Country =  Referenced.Country

The following condition is set on a reference picker to select the Manager (an instance of Employee) of a given Employee. It filters the selectable managers (Referenced) so that they are in the same CostCenter as the the current manager (Record.Manager).

Record.Manager.FID_CostCenter = Referenced.FID_CostCenter

See the Reference Selection section in the Steppers chapter in the Semarchy xDM Developer’s Guide for more information.

USING SEMQL IN SEMARCHY XDM

This section describe the various uses of SemQL in Semarchy xDM, as well as the attributes available for these uses.

Using SemQL at Design-Time

At design-time, SemQL is used:

In the entities, to generate automatically the Source ID of an entity when creating or importing new records.
In the various rules defining the Certification Process, that is the enrichers, validations, matchers and survivorship rules.
In the Applications to define the filters applied to the lists in the business views and the form/collections attributes.
In properties that support SemQL in addition to literal values.
In the conditions used for securing data Row-Based Filtering.

SemQL for ID Generation

The Source ID of an entity can be generated using a SemQL expression. This expression is executed and the ID generated when a record form is saved for the first time in a stepper or when a record is imported from a file containing no IDs.

This expression can use:

Attributes from the entity.
Attributes of the referenced (parent) entities.
SemQL functions and operators. You can for example use the SEQ_NEXTVAL function to retrieve the next increment of a named sequence.

Examples of SemQL for ID Generation:

To create an ID using a literal concatenated with a sequence increment, for a customer entity:
'CUST_' || SEQ_NEXTVAL('CUST_SEQ')
To create an ID using values from the record itself, for a customer entity:
Upper(LastName) || '_' || CustomerNumber
To create an ID concatenating the IDs of two related records. Typically, for an entity that relates products and distribution markets:
DistributedProduct.ProductID || '_' || DistributionMarket.MarketID

An ID generated with a SemQL expression is immutable. This ID will not change after the initial record creation even if the value of the attributes used in the expression change. For example, if you generate the ID using the customer name and edit a saved customer, modifying its name will not alter the ID.

SemQL in the Certification Process

The certification process takes source records pushed to the MDM Hub by identified Publishers and creates enriched, validated, matched and consolidated golden records. SemQL is involved in the various phases of this process.

Certification Process Clauses

The following table describes the expressions, condition and order by clauses used in the certification process.

Process Step

Clause Type

Description

SemQL Enricher Filter

Condition

This condition filters the source records of the entity that must go through this enricher.

SemQL Enricher Expressions

Expression

The result of the expressions load the enriched entity’s attributes.

Plug-in Enricher Filter

Condition

This condition filters the source records of the entity that must go through this enricher.

Plug-in Enricher Inputs

Expression

Each expression result is pushed to a plug-in input. The outputs of the plug-in load the entity’s attributes with enriched values.

SemQL Validation Condition

Condition

This condition defines which source or golden records pass the validation.

Plug-in Validation Input

Expression

Each expression result is pushed to a plug-in input. The plug-in outputs a boolean defining the records that pass the validation.

Matcher Binning Expression

Expression

Master records having the same value for all binning expressions are in the same bin, and will be matched using the matching condition.

Matcher Matching Condition

Condition

Master records in the same bin for which the matching condition returns true are considered as matches.

Consolidation Rule Ranking Expression

Order By

For Custom Ranking strategy, the consolidation uses values from the first of the records ordered by this expression. For other strategies, if the consolidation strategy returns two records with the same rank, the consolidation sorts them by this expression and uses the value of the first one.

Available Attributes by Clause

In the certification process:

Attributes from the entity being processed are always available.
Built-in attributes availability depends on the type of entity and on the clause being created.
Attributes from related entities (parent or child entities as well as Lineage parent and child records) availablity also depends on the type of the entity and on the clause being created.

Use The SemQL Editor for the list of attributes available in each situation.

Certification Process SemQL Example

Enricher Expressions

Examples of enricher conditions:

FirstName: InitCap(FirstName)
Name: InitCap(FirstName) || ' ' || Upper(LastName)
City: Replace(Upper(InputAddress.City),'CEDEX','')
EmployeeCostCenterName : CostCenter.CostCenterName: Current Employee entity references the CostCenter entity and this expression returns an employee’s cost center name

Validation Conditions

Checking the Customer’s InputAddress complex attribute validity:

InputAddress.Address is not null and
( InputAddress.PostalCode is not null or InputAddress.City is not null)

In this example, the IS NOT NULL, AND and OR SemQL operators are used to build the condition.

It is possible to use child records in validations ( See Child Entities for more information).
For example, a Customer is valid only if all its attached Contacts have an email:

ALL Contacts  HAVE (EmailAddress is not null)

When running such validation pre-consolidation, only the contacts within the same load are considered. Post consolidation, contacts from the load as well as those already in the hub are taken into account.

Matcher

Binning Expression to group customers by their Country/PostalCode:
```
InputAddress.Country || InputAddress.PostalCode
```

Matching Condition: Matching two customer records by name, address and city name similarity:

SEM_EDIT_DISTANCE_SIMILARITY( Record1.CustomerName, Record2.CustomerName ) > 65
and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.Address, Record2.InputAddress.Address ) > 65
and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.City, Record2.InputAddress.City ) > 65

In this second example, SEM_EDIT_DISTANCE_SIMILARITY is a SemQL function. Record1 and Record2 are predefined names for qualifying the two records to match.

Matching on child entities is possible by configuring the rule to use Child Records.
For example, to match two Customers when they have a Contact with the same FirstName and LastName and a CEO title, and the Customer names are 50% similar, configure the match rule with the Child Records set to Contact and use the following clause :

    Record1.FirstName = Record2.FirstName
AND Record1.LastName  = Record2.LastName
AND Record1.JobTitle  = 'CEO' and Record2.JobTitle = 'CEO'
AND SEM_EDIT_DISTANCE_SIMILARITY (Record1.Customer.Name, Record2.Customer.Name) > 50

This rule matches Customers but its base entity is Contact. So the Record1.Customer.Name expression returns the name of the customer the contact we match is attached to.

SemQL in Privilege Grants

Entity privileges support Row-Level Filtering, to apply privileges only to a subset of the records of the entity. The subsets are defined using SemQL filters.
Each entity privilege grant may be associated with a filtering Condition.
This condition has access to all attributes of the entity, its related entities and built-in attributes ClassName, CreationDate, UpdateDate, Creator and Updator.

To grant a given privilege to a user whose email address (stored in a CORPORATE_USER_EMAIL variable) appears in the current record in the EmailAddress attribute, the filter is EmailAddress = :CORPORATE_USER_EMAIL.

SemQL in Applications

Business View Transition Path

Business entities in business views are related using transition paths. A transition path is an expression pointing to records in a directly or indirectly related entity.

For example, to display all the siblings of an employee in a manager/report hierarchy, the following transition path is used:Manager.Reports.

Business View Filters

Each business entity of a business view supports a Filter condition that filters the records displayed in this business entity.
This condition has access to all attributes of the entity, its related entities and built-in attributes ClassName, CreationDate, UpdateDate, Creator and Updator.

To filter at the root of the hierarchy of cost centers only those with no parent, the following filter is applied:FID_ParentCostCenter is null. FID_ParentCostCenter is the attribute representing the relation between a cost center and its parent cost center.

Form Fields/Table Columns/Properties

Form fields and table columns, as well as application properties support SemQL Expressions. These expressions allow building composite attributes for display purposes and customizing the appearance of the application.

When a form is used for data authoring, form fields using SemQL expressions appear as read-only.

This expression has access to all attributes of the entity, its related entities and built-in attributes.

If you use in a form field a built-in attribute specific to a certain view (for example, the PublisherID, which exists on source and master data but not on golden data), this form field is automatically hidden when the form is used with a view that does not support the built-in attribute.

To display a custom attribute called ContactsStatus on a Customer form which displays whether this customer has contacts and whether these contacts have the IsInfluencer flag set to '1':

CASE
 WHEN ANY Contacts HAVE (IsInfluencer = 1 ) THEN 'One or More Influencers'
 WHEN NOT ANY Contacts HAVE (1=1) THEN 'No Contacts'
 ELSE 'No Influencer Contact'
END

Predefined Sort

In Business Views, you can define an Expression that is used to sort the records under the node in the business view.

Reference Picker Filters

In Business Views, SemQL conditions can filter the records selectable with reference pickers. See Reference Pickers for more information.

Search Forms

In Search Forms, you define a SemQL condition to filter records based on the search parameters values entered in the form.

This condition has access to all attributes of the searched entity, its related entities and built-in attributes. Note that certain built-in attributes should be used with caution as some do not apply to all the views the search form may apply to.

Using SemQL at Run-Time

Filters

All date view in an application support filters in the form of Conditions.
This condition has access to all attributes of the entity, its related entities and built-il attributes that depend on the view that is accessed.
For example, on a Golden Record: BatchID, ClassName, CreationDate, UpdateDate, Creator and Updator are available.

Duplicates Filters

When filtering duplicates, for example to filter records to checkout in a duplicate management activity, it is possible to use Record1and Record2 as in a Matcher, except that the comparison takes place on the pairs of records that have matched.

To show matching records, but with the CustomerName fields less similar by less than 95%:

SEM_EDIT_DISTANCE_SIMILARITY(Record1.CustomerName, Record2.CustomerName) < 95

THE SEMQL EDITOR

The SemQL editor can be called from the workbench when a SemQL expression, condition or clause needs to be built.

The SemQL Editor

This editor is organized as follows:

Attributes available for the expression appear in left panel. Double-click an attribute to add it to the expression.
Functions declared in SemQL appear in the left bottom panel, grouped in function groups. Double-click a function to add it to the expression.
Variables available for the expression appear in the bottom center panel.
Messages appear in the right bottom panel, showing parsing errors and warnings.
Description for the selected function or attribute appear at the bottom of the editor.
The Toolbar allows to indent the code or hide/display the various panels of the editor and to undo/redo code edits.

APPENDIX A: SEMQL FUNCTIONS LIST

The following tables lists the built-in functions available in SemQL.

Functions for Oracle

The following functions are available when using Oracle.

Function Description

ABS

Returns the absolute value of number.

ACOS

Returns the arc cosine of number. The argument number must be in the range of -1 to 1, and the function returns a value in the range of 0 to pi, expressed in radians.

ADD_MONTHS

Returns the date "date" plus "integer" months. The date argument can be a datetime value or any value that can be implicitly converted to DATE. The integer argument can be an integer or any value that can be implicitly converted to an integer.

ASCII

Returns the decimal representation in the database character set of the first character of string.

ASCIISTR

Takes as its argument a string, or an expression that resolves to a string, in any character set and returns an ASCII version of the string in the database character set. Non-ASCII characters are converted to the form xxxx, where xxxx represents a UTF-16 code unit.

ASIN

Returns the arc sine of number. The argument number must be in the range of -1 to 1, and the function returns a value in the range of -pi/2 to pi/2, expressed in radians.

ATAN

Returns the arc tangent of number. The argument number can be in an unbounded range and the function returns a value in the range of -pi/2 to pi/2, expressed in radians.

ATAN2

Returns the arc tangent of number1 and number2. The argument number1 can be in an unbounded range and the function returns a value in the range of -pi to pi, depending on the signs of number1 and number2, expressed in radians. ATAN2(n1,n2) is the same as ATAN2(n1/n2).

BIN_TO_NUM

Converts a bit vector to its equivalent number. Each argument to this function represents a bit in the bit vector. This function takes as arguments any numeric data type, or any nonnumeric data type that can be implicitly converted to a number. Each expr must evaluate to 0 or 1.

BITAND

Computes an AND operation on the bits of expr1 and expr2, both of which must resolve to nonnegative integers, and returns an integer.

CEIL

Returns the smallest integer greater than or equal to number.

CHR

Returns the character having the binary equivalent to number as a string value in the database character set.

COALESCE

Returns the first non-null expr in the expression list. At least one expr must not be the literal NULL. If all occurrences of expr evaluate to null, then the function returns null.

COMPOSE

Takes as its argument a string, or an expression that resolves to a string, in any data type, and returns a Unicode string in its fully normalized form in the same character set as the input.

CONCAT

Returns string1 concatenated with string2. This function is equivalent to the concatenation operator.

CONVERT

Converts a character string from one character set to another.

COS

Returns the cosine of number (an angle expressed in radians).

COSH

Returns the hyperbolic cosine of number.

CURRENT_DATE

Returns the current date in the session time zone, in a value in the Gregorian calendar.

CURRENT_TIMESTAMP

Returns the current date and time in the session time zone. If you omit precision, then the default is 6.

DBTIMEZONE

Returns the value of the database time zone. The return type is a time zone offset (a character type in the format '[+|-]TZH:TZM') or a time zone region name.

DECODE

Compares expr to each search value one by one. If expr is equal to a search, then it returns the corresponding result. If no match is found, then it returns default. If default is omitted, then Oracle returns null.

DECOMPOSE

Takes as its argument a string in any data type and returns a Unicode string after decomposition in the same character set as the input. For example, an o-umlaut code point will be returned as the "o" code point followed by an umlaut code point.

EXP

Returns e raised to the number-th power, where e = 2.71828183 … The function returns a value of the same type as the argument.

EXTRACT_DAY

Extracts and returns the day from expr. expr must be a valid ANSI date.

EXTRACT_HOUR

Extracts and returns the hour from expr. expr must be a valid ANSI datetime.

EXTRACT_MINUTE

Extracts and returns the minute from expr. expr must be a valid ANSI datetime.

EXTRACT_MONTH

Extracts and returns the month from expr. expr must be a valid ANSI date.

EXTRACT_SECOND

Extracts and returns the second from expr. expr must be a valid ANSI datetime.

EXTRACT_TIMEZONE_ABBR

Extracts and returns the abbreviation of the timezone from expr. expr must be a valid ANSI datetime including a timezone.

EXTRACT_TIMEZONE_HOUR

Extracts and returns the hour of the timezone from expr. expr must be a valid ANSI datetime including a timezone.

EXTRACT_TIMEZONE_MINUTE

Extracts and returns the minute of the timezone from expr. expr must be a valid ANSI datetime including a timezone.

EXTRACT_TIMEZONE_REGION

Extracts and returns the region of the timezone from expr. expr must be a valid ANSI datetime including a timezone.

EXTRACT_YEAR

Extracts and returns the year from expr. expr must be a valid ANSI date.

FLOOR

Returns largest integer equal to or less than number.

FROM_TZ

Converts a timestamp value and a time zone to a TIMESTAMP WITH TIME ZONE value. Time_zone_value is a character string in the format 'TZH:TZM' or a character expression that returns a string in TZR with optional TZD format.

GREATEST

Returns the greatest of the list of one or more expressions.

HEXTORAW

Converts string containing hexadecimal digits to a raw value.

INITCAP

Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by white space or characters that are not alphanumeric.

INSTR

Searches string for substring using the input character set. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for.

INSTR2

Searches string for substring using UC2 code points. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for.

INSTR4

Searches string for substring using UC4 code points. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for.

INSTRB

Searches string for substring using bytes instead of character. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for.

INSTRC

Searches string for substring using Unicode complete characters. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for.

LAST_DAY

Returns the date of the last day of the month that contains date.

LEAST

Returns the least of the list of expressions.

LENGTH

Returns the length of string. Length is calculated using characters as defined by the input character set.

LENGTH2

Returns the length of string. Length is calculated using characters as defined by the UC2 code point.

LENGTH4

Returns the length of string. Length is calculated using characters as defined by the UC4 code point.

LENGTHB

Returns the length of string. Length is calculated using bytes instead of characters.

LENGTHC

Returns the length of string. Length is calculated using Unicode complete characters.

LN

Returns the natural logarithm of number, where number is greater than 0.

LOCALTIMESTAMP

Returns the current date and time in the session time zone.

LOG

Returns the logarithm, base number2, of number1. The base number1 can be any positive value other than 0 or 1 and number2 can be any positive value.

LOWER

Returns char, with all letters lowercase.

LPAD

Returns expr1, left-padded to length number characters with the sequence of characters in expr2. If you do not specify expr2, then the default is a single blank.

LTRIM

Removes from the left end of string all of the characters contained in set_of_chars. If you do not specify set_of_chars, it defaults to a single blank.

MOD

Returns the remainder of number2 divided by number1. Returns number2 if number1 is 0.

MONTHS_BETWEEN

Returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative.

NANVL

If number1 is not a number (NaN) then NANVL returns number2. Otherwise, it returns number1.

NCHR

Returns the character having the binary equivalent to number as a string value in the national character set.

NEW_TIME

Returns the date and time (given in timezone 1) converted in time zone timezone2.

NEXT_DAY

Returns the date of the first weekday named by day_name that is later than the date date.

NLSSORT

Returns a collation key for string, that is a string of bytes used to sort strings. lsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. This function manages language-specific sorting.

NLS_INITCAP

Returns string with the first letter of each word in uppercase and all other letters in lowercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. This function manages language-specific characters case changes.

NLS_LOWER

Returns string with all letters in lowercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence.

NLS_UPPER

Returns string with all letters in uppercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence.

NULLIF

Compares expr1 and expr2. If they are equal, then the function returns null. If they are not equal, then the function returns expr1. You cannot specify the literal NULL for expr1. The NULLIF function is logically equivalent to the following CASE expression: CASE WHEN expr1 = expr 2 THEN NULL ELSE expr1 END

NUMTODSINTERVAL

Converts number to an INTERVAL DAY TO SECOND literal. The value for interval_unit specifies the unit of number and must resolve to one of the following string values: 'DAY', 'HOUR', 'MINUTE', 'SECOND'.

NUMTOYMINTERVAL

Converts number to an INTERVAL YEAR TO MONTH literal. The value for interval_unit specifies the unit of number and must resolve to one of the following string values: 'YEAR', 'MONTH'.

NVL

If expr1 is null, then NVL returns expr2. If expr1 is not null, then NVL returns expr1.

NVL2

If expr1 is not null, then NVL2 returns expr2. If expr1 is null, then NVL2 returns expr3.

ORA_HASH

Computes a hash value for a given expression. The expr argument determines the data for which you want to compute a hash value. The optional max_bucket argument determines the maximum bucket value returned by the hash function. You can specify any value between 0 and 4294967295. The default is 4294967295. The optional seed_value argument produces many different results for the same set of data.

POWER

Returns number2 raised to the number1 power. The base number2 and the exponent number1 can be any numbers, but if number2 is negative, then number1 must be an integer.

RAWTOHEX

Converts raw to a character value containing its hexadecimal equivalent.

REGEXP_COUNT

Extends the functionality of the REGEXP_INSTR function by returning the number of times a pattern occurs in a source string, starting at position. match_param contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match, n - Allow '.' to match even the newline character, m - treat input as multiple lines, x - ignore whitespaces.

REGEXP_INSTR

Extends the functionality of the INSTR function by letting you search a string for a regular expression pattern. It returns an integer indicating the beginning or ending position of the matched substring, depending on the value of the return_option argument. If no match is found, the function returns 0.

REGEXP_REPLACE

Extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string.

REGEXP_SUBSTR

Extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself. This function is useful if you need the contents of a match string but not its position in the source string.

REMAINDER

Returns the remainder of number2 divided by number1.

REPLACE

Returns string with every occurrence of search_string replaced with replacement_string. If replacement_string is omitted or null, then all occurrences of search_string are removed. If search_string is null, then string is returned.

ROUND

Returns date_or_number rounded to the unit specified by the format model fmt_or_integer.

RPAD

Returns expr1, right-padded to length number characters with expr2, replicated as many times as necessary. If you do not specify expr2, then it defaults to a single blank.

RTRIM

Removes from the right end of string all of the characters that appear in set_of_chars. If you do not specify set_of_chars, then it defaults to a single blank.

SEM_BOOLEAN_TO_CHAR

Converts a boolean to its standard string representation.

SEM_CONCAT

Returns a string concatenating the input values with the separator, optionally removing null values

SEM_EDIT_DISTANCE

Calculates the distance between two strings, that is the number of insertions, deletions or substitutions required to transform string1 into string2. If one or both of the strings are null the distance will be the largest integer value (2147483647). Note that this function measures the distance in number of bytes and not in characters. As a consequence, strings stored using variable-width characters sets (UTF-8, for example) can cause counter-intuitive results. It is recommended to convert these strings to a fixed-width character set (AL16UTF16, for example) prior to passing them to this function.

SEM_EDIT_DISTANCE_SIMILARITY

Calculates the distance between string1 into string2 (as described in the SEM_EDIT_DISTANCE function), and returns the Normalized value of the Edit Distance between two Strings. The value is between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0.

SEM_INSTR

Finds the location of a substring within a specified string.

SEM_JARO_WINKLER

Calculates the measure of agreement between two strings using Jaro-Winkler method. The value is between 0 (no match) and 1 (perfect match). If one or both strings are null the result will be 0.

SEM_JARO_WINKLER_SIMILARITY

Calculates the measure of agreement between two strings using Jaro-Winkler method, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0.

SEM_NGRAMS_SIMILARITY

Calculates the measure of agreement between two strings using the Dice’s coefficient similarity measure applied to the n-grams of the strings, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. The ngrams_length parameter defines the length of the n-grams (2 by default).

SEM_NORMALIZE

Returns a string with Latin (supplement, Extended-A and Extended-B) characters converted into their ASCII equivalents, and other (space and non-alphanumeric) characters eliminated.

SEM_NUMBER_TO_CHAR

Converts a number to its standard string representation.

SEM_SUBSTRING

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set.

SEM_TIMESTAMP_TO_CHAR

Converts a timestamp to its standard string representation.

SEM_TO_CHAR

Converts a string or a number to its standard string representation.

SEM_UUID_TO_CHAR

Converts a UUID to its standard string representation.

SEQ_NEXTVAL

Get the next value of a sequence. Note that this function is not supported in Enrichers.

SIGN

Returns the sign of number. The sign is: -1 if n<0, 0 if n=0, 1 if n>0.

SIN

Returns the sine of number (an angle expressed in radians).

SINH

Returns the hyperbolic sine of number.

SOUNDEX

Returns a character string containing the phonetic representation of string. This function lets you compare words that are spelled differently, but sound alike in English.

SQRT

Returns the square root of number.

STANDARD_HASH

Computes a hash value for a given expression and returns it in a RAW value. The optional method lets you choose the hash algorithm (defaults to SHA1) in the following list: SHA1, SHA256, SHA384, SHA512 and MD5. This function requires Oracle version 12c or above.

SUBSTR

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set.

SUBSTR2

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using UCS2 code points.

SUBSTR4

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using UCS4 code points.

SUBSTRB

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using bytes instead of characters.

SUBSTRC

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using Unicode complete characters.

SYSDATE

Returns the current date and time set for the operating system on which the database resides.

SYSTIMESTAMP

Returns the system date, including fractional seconds and time zone, of the system on which the database resides.

SYS_CONTEXT

Returns the value of parameter associated with the context namespace.

SYS_EXTRACT_UTC

Extracts the UTC (Coordinated Universal Time—formerly Greenwich Mean Time) from a datetime value with time zone offset or time zone region name.

SYS_GUID

Generates and returns a globally unique identifier (RAW value) made up of 16 bytes.

TAN

Returns the tangent of number (an angle expressed in radians).

TANH

Returns the hyperbolic tangent of number.

TO_BINARY_DOUBLE

Returns a double-precision floating-point number.

TO_BINARY_FLOAT

Returns a single-precision floating-point number.

TO_CHAR

Converts expr to its string representation optionally using fmt and nlsparam for the conversion.

TO_CLOB

Converts expr to a CLOB (large string)

TO_DATE

Converts string to a date value. The fmt is a datetime model format specifying the format of string. If you omit fmt, then string must be in the default date format. If fmt is J, for Julian, then string must be an integer.

TO_DSINTERVAL

Converts a character string to an INTERVAL DAY TO SECOND value.

TO_MULTI_BYTE

Returns string with all of its single-byte characters converted to their corresponding multibyte characters.

TO_NUMBER

Converts expr to a number value using the optional format model fmt and nlsparam.

TO_SINGLE_BYTE

Returns string with all of its multibyte characters converted to their corresponding single-byte characters.

TO_TIMESTAMP

Converts string a timestamp value. The optional fmt specifies the format of string.

TO_TIMESTAMP_TZ

Converts string to a TIMESTAMP WITH TIME ZONE value. The optional fmt specifies the format of string.

TO_YMINTERVAL

Converts string to an INTERVAL YEAR TO MONTH type.

TRANSLATE

Returns expr with all occurrences of each character in from_string replaced by its corresponding character in to_string. Characters in expr that are not in from_string are not replaced. The argument from_string can contain more characters than to_string. In this case, the extra characters at the end of from_string have no corresponding characters in to_string. If these extra characters appear in string, then they are removed from the return value.

TRIM

Removes from the left and right ends of string all of the characters contained in set_of_chars. If you do not specify set_of_chars, it defaults to a single blank.

TRUNC

When expr is a date, returns expr with the time portion of the day truncated to the unit specified by the format model fmt_or_number. If you omit fmt_or_number, then date is truncated to the nearest day. When expr is a number, returns expr truncated to fmt_or_number decimal places. If fmt_or_number is omitted, then expr is truncated to 0 places.

UNISTR

Takes as its argument a text literal or an expression that resolves to character data and returns it in the national character set. The national character set of the database can be either AL16UTF16 or UTF8. UNISTR provides support for Unicode string literals by letting you specify the Unicode encoding value of characters in the string.

UPPER

Returns string with all letters uppercase.

WIDTH_BUCKET

Lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. For a given expression, the function returns the bucket number into which the value of this expression would fall after being evaluated. Expr must evaluate to a numeric or datetime. Min_value and max_value are expressions that resolve to the end points of the acceptable range for expr. Both of these expressions must also evaluate to numeric or datetime values, and neither can evaluate to null.

Functions for PostgreSQL

The following functions are available when using PostgreSQL.

Function Description

ABS

Returns the absolute value

ACOS

Returns the inverse cosine

AGE

Subtracts arguments, producing a symbolic result that uses years and months, rather than just days

ARRAY_AGG

Concatenates input values, including nulls, into an array. If the inputs are arrays, concatenates them into an array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or NULL)

ASCII

ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.

ASIN

Returns the inverse sine

ATAN

Returns the inverse tangent

ATAN2

Returns the inverse tangent of number_1/number_2

AVG

Returns the average (arithmetic mean) of all input values

BIT_AND

Returns the bitwise AND of all non-null input values, or null if none

BIT_LENGTH

Returns the number of bits in string

BIT_OR

Returns the bitwise OR of all non-null input values, or null if none

BOOL_AND

Returns true if all input values are true, otherwise false

BOOL_OR

Returns true if at least one input value is true, otherwise false

BTRIM

Removes the longest string consisting only of characters in characters (a space by default) from the start and end of string

CBRT

Returns the cube root

CEIL

Returns the nearest integer greater than or equal to argument

CEILING

Returns the nearest integer greater than or equal to argument (same as ceil)

CHAR_LENGTH

Returns the number of characters in string

CHR

Returns the character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes.

CLOCK_TIMESTAMP

Returns the current date and time (changes during statement execution)

COALESCE

The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null.

CONCAT

Concatenates the text representations of all the arguments. NULL arguments are ignored.

CONCAT_WS

Concatenates all but the first argument with separators. The first argument is used as the separator string. NULL arguments are ignored.

CONVERT

Converts string to dest_encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding.

CONVERT_FROM

Converts string to the database encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding.

CONVERT_TO

Converts string to dest_encoding.

COS

Returns the cosine

COT

Returns the cotangent

COUNT

Returns the number of input rows for which the value of expression is not null

CURRENT_DATE

Returns the current date

CURRENT_TIME

Returns the current time of day

CURRENT_TIMESTAMP

Returns the current date and time (at the beginning of current the transaction)

CURRVAL

Return value most recently obtained with nextval for specified sequence

DATERANGE

Creates a range of dates. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'.

DATE_PART

Get subfield from a timestamp or an interval. The part to extract is defined by the text.

DATE_TRUNC

Truncate timestamp or interval to the precision specified in the text.

DECODE

Decodes binary data from textual representation in string. Options for format are same as in encode.

DEGREES

Converts radians to degrees

DIV

Returns the integer quotient of number_1/number_2

ENCODE

Encodes binary data into a textual representation. Supported formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes.

EVERY

Equivalent to bool_and

EXP

Returns the exponential

FLOOR

Returns the nearest integer less than or equal to argument

FORMAT

Formats the arguments according to format_string. This function is similar to the C function sprintf.

GET_BIT

Extract bit from string

GET_BYTE

Extract byte from string

GREATEST

The GREATEST function selects the largest value from a list of any number of expressions.

INITCAP

Converts the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

INT4RANGE

Creates a range of integers. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'.

INT8RANGE

Creates a range of bigints. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'.

ISEMPTY

Returns a boolean indicating whether the range empty.

ISFINITE

Tests for finite date, interval or timestamp (not +/-infinity)

JSONB_AGG

Aggregates values as a JSON array

JSONB_OBJECT_AGG

Aggregates name/value pairs as a JSON object

JSON_AGG

Aggregates values as a JSON array

JSON_OBJECT_AGG

Aggregates name/value pairs as a JSON object

JUSTIFY_DAYS

Adjust interval so 30-day time periods are represented as months

JUSTIFY_HOURS

Adjust interval so 24-hour time periods are represented as days

JUSTIFY_INTERVAL

Adjust interval using justify_days and justify_hours, with additional sign adjustments

LASTVAL

Return value most recently obtained with nextval for any sequence

LEAST

The LEAST function selects the smallest value from a list of any number of expressions.

LEFT

Returns the first n characters in the string. When n is negative, return all but last n characters.

LENGTH

Returns the number of characters in string with an optional given encoding. The string must be valid in this encoding.

LN

Returns the natural logarithm.

LOCALTIME

Returns the current time of day

LOCALTIMESTAMP

Return the current date and time (at start of current transaction)

LOG

Returns the logarithm in base 10.

LOWER

Convert string to lower case

LOWER_INC

Returns a boolean indicating whether the lower bound of the range is inclusive.

LOWER_INF

Returns a boolean indicating whether the lower bound of the range is infinite.

LPAD

Fills up the string to length length by prepending the characters fill_text (a space by default). If the string is already longer than length then it is truncated (on the right).

LTRIM

Removes the longest string containing only characters from characters (a space by default) from the start of string

MAKE_DATE

Create date from integer year, month and day fields

MAKE_INTERVAL

Create interval from years, months, weeks, days, hours, minutes and seconds fields. If a field is left empty, it defaults to zero.

MAKE_TIME

Create time from hour, minute and seconds fields

MAKE_TIMESTAMP

Create timestamp from year, month, day, hour, minute and seconds fields

MAKE_TIMESTAMPTZ

Create timestamp with time zone from year, month, day, hour, minute and seconds fields; if timezone is not specified, the current time zone is used

MAX

Returns the maximum value of expression across all input values

MD5

Calculates the MD5 hash of string, returning the result in hexadecimal

MIN

Returns the minimum value of expression across all input values

MOD

Returns the remainder of number_1/number_2

NEXTVAL

Advance sequence and return new value

NOW

Returns the current date and time (start of current transaction)

NULLIF

The NULLIF function returns a null value if value1 equals value2; otherwise it returns value1.

NUMRANGE

Creates a range of numerics. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'.

NUM_NONNULLS

Returns the number of non-null values

NUM_NULLS

Returns the number of null values

OCTET_LENGTH

Returns the number of bytes in string

OVERLAY

Overlays string with overlay_string, starting at start_position and for length characters.

PI

Returns the PI constant.

POWER

Returns number_1 raised to the power of number_2

QUOTE_IDENT

Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled.

QUOTE_LITERAL

Returns the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal returns null on null input; if the argument might be null, quote_nullable is often more suitable.

QUOTE_NULLABLE

Returns the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled.

RADIANS

Converts degrees to radians

RANDOM

Returns a random value in the [0, 1] range.

RANGE_LOWER

Returns the lower bound of range

RANGE_MERGE

Returns the smallest range which includes both of the given ranges

RANGE_UPPER

Returns the upper bound of range

REGEXP_MATCH

Returns the captured substring(s) resulting from the first match of a POSIX regular expression to the string.

REGEXP_REPLACE

Replaces the substring(s) matching a POSIX regular expression.

REGEXP_SPLIT_TO_ARRAY

Splits the string using a POSIX regular expression as the delimiter.

REPEAT

Repeats string the specified number of times

REPLACE

Replaces all occurrences in string of substring from with substring to

REVERSE

Returns the reversed string.

RIGHT

Returns last n characters in the string. When n is negative, return all but first |n| characters.

ROUND

Rounds the number to the nearest integer or to int decimal places

RPAD

Fills up the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.

RTRIM

Removes the longest string containing only characters from characters (a space by default) from the end of string

SCALE

Returns the scale of the argument (the number of decimal digits in the fractional part)

SEM_BOOLEAN_TO_CHAR

Converts a boolean to its standard string representation.

SEM_CAST_BOOLEAN

Explicitly cast its argument as a boolean. This can help the database or JDBC driver to infer the type of a bound parameter.

SEM_CAST_NUMERIC

Explicitly cast its argument as a numeric value (decimal or integer). This can help the database or JDBC driver to infer the type of a bound parameter.

SEM_CAST_STRING

Explicitly cast its argument as a string (varchar). This can help the database or JDBC driver to infer the type of a bound parameter.

SEM_CAST_TIMESTAMP

Explicitly cast its argument as a timestamp. This can help the database or JDBC driver to infer the type of a bound parameter.

SEM_CAST_UUID

Explicitly cast its argument as a UUID. This can help the database or JDBC driver to infer the type of a bound parameter.

SEM_CONCAT

Returns a string concatenating the input values with the separator, optionally removing null values

SEM_EDIT_DISTANCE

Calculates the number of insertions, deletions or substitutions required to transform string1 into string2. If one or both of the strings are null the distance will be the largest integer value (2147483647).

SEM_EDIT_DISTANCE_SIMILARITY

Calculates the number of insertions, deletions or substitutions required to transform string1 into string2, and returns the Normalized value of the Edit Distance between two Strings. The value is between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0.

SEM_INSTR

Finds the location of a substring within a specified string.

SEM_JARO_WINKLER

Calculates the measure of agreement between two strings using Jaro-Winkler method. The value is between 0 (no match) and 1 (perfect match). If one or both strings are null the result will be 0.

SEM_JARO_WINKLER_SIMILARITY

Calculates the measure of agreement between two strings using Jaro-Winkler method, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0.

SEM_NGRAMS_SIMILARITY

Calculates the measure of agreement between two strings using the Dice’s coefficient similarity measure applied to the n-grams of the strings, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. The ngrams_length parameter defines the length of the n-grams (2 by default).

SEM_NORMALIZE

Returns a string with Latin (supplement, Extended-A and Extended-B) characters converted into their ASCII equivalents, and other (space and non-alphanumeric) characters eliminated.

SEM_NUMBER_TO_CHAR

Converts a number to its standard string representation.

SEM_SUBSTRING

Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set.

SEM_TIMESTAMP_TO_CHAR

Converts a timestamp to its standard string representation.

SEM_TO_CHAR

Converts a string or a number to its standard string representation.

SEM_UUID_TO_CHAR

Converts a UUID to its standard string representation.

SEQ_NEXTVAL

Get the next value of a sequence. Note that this function is not supported in Enrichers.

SETSEED

Sets the seed for subsequent random() calls (value between -1.0 and 1.0, inclusive)

SETVAL

Set sequence’s current value

SET_BIT

Set bit in string

SET_BYTE

Set byte in string

SIGN

Returns the sign of the argument (-1, 0, +1)

SIN

Returns the sine.

SPLIT_PART

Splits string on delimiter and return the element at position (counting from one)

SQRT

Returns the square root

STATEMENT_TIMESTAMP

Returns the current date and time (start of current statement)

STRING_AGG

Return input values concatenated into a string, separated by delimiter

STRPOS

Returns the location of specified substring in string

SUBSTR

Extracts a substring from string starting at from position and for count characters.

SUBSTRING_REGEX_PATTERN

Extracts from string a substring matching the pattern (a POSIX regular expression).

SUBSTRING_SQL_PATTERN

Extracts from string a substring matching an SQL regular expression.

SUM

Returns the sum of expression across all input values

TAN

Returns the tangent.

TIMEOFDAY

Returns the current date and time (like clock_timestamp, but as a text string)

TO_ASCII

Converts string to ASCII from another encoding (only supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings)

TO_CHAR

Convert the value (time stamp, interval, integer, real/double, numeric, string) to string according to the format.

TO_DATE

Converts the string to date

TO_HEX

Converts number to its equivalent hexadecimal representation

TO_NUMBER

Converts the string to numeric

TO_TIMESTAMP

Converts the string to time stamp

TRANSACTION_TIMESTAMP

Returns the current date and time (start of current transaction)

TRANSLATE

Any character in string that matches a character in the from set is replaced by the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are removed.

TRUNC

Truncates the number toward zero or to int decimal places

TSRANGE

Creates a range of timestamps without time zone. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'.

TSTZRANGE

Creates a range of timestamps with time zone. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'.

UPPER

Converts string to upper case

UPPER_INC

Returns a boolean indicating whether the upper bound of the range is inclusive.

UPPER_INF

Returns a boolean indicating whether the upper bound of the range is infinite.

WIDTH_BUCKET

Returns the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside the range.

WIDTH_BUCKET_THRESHOLDS

Returns the bucket number to which operand would be assigned given an array listing the lower bounds of the buckets; returns 0 for an input less than the first lower bound; the thresholds array must be sorted, smallest first, or unexpected results will be obtained.

XMLAGG

Returns the concatenation of XML values

APPENDIX B: DATA ACCESS VIEWS, BUILT-IN ATTRIBUTES AND LINEAGE NAVIGATION

The following tables lists the built-in views, attributes and lineage navigation transitions available in SemQL.

Data Access Views

The following table lists the available data access views. Each view is identified by an alias (for example SD for the Source Data) that corresponds to the physical table storing this data if such a table exists.

Alias

Name

Description

AE

Source Authoring Errors

Errors detected on source authoring data.

DU

Duplicates

Duplicates detected by the matching process.

GD

Golden Data

Golden consolidated and certified records.

GDWE

Golden Data with Errors

Golden data with errors.

GE

Post-Consolidation Errors

Errors detected after master data consolidation (post-consolidation).

GH

Golden History

Golden records history.

GH4B

Golden Data As of Batch

Golden data history as of batch or golden data if not historized.

GI

Golden Integration

Latest consolidated (integrated) golden records before post-consolidation validation.

GX

Deleted Golden

Deleted golden records logs (and data for soft delete).

MD

Master Data

Enriched, validated and cleansed master records.

MH

Master History

Master records history.

MH4B

Master Data As of Batch

Master data history as of batch or master data if not historized.

MI

Master Integration

Latest integrated master records.

MX

Deleted Master

Deleted master records logs (and data for soft delete).

SA

Source Authoring

Source data authored by users.

SA4L

Source Authoring for Load

Source data authored by users for a given load.

SA4LK

Source Authoring Lookup

References lookup for source authoring data.

SAWE

Source Authoring with Errors

Source data authored by users with errors.

SD

Source Data

Source data successively loaded by the publishers.

SD4L

Source Data for Load

Source data loaded by the publishers for a given load.

SD4LK

Source Data Lookup

References lookup for source data loaded by the publishers for a given load.

SDWE

Source Data with Errors

Source data loaded by the publishers with errors.

SE

Source Errors

Errors detected on source data loaded by the publishers (pre-consolidation).

UG

Duplicates Management Consolidated Data

Data consolidated from user decisions taken in a duplicates manager.

UM

Duplicates Management User Decisions

Master records changed by a user using duplicates management.

Built-in Attributes

The following table lists the attributes, with the views into which they are available.

Name Label Views Description

AuthoringType

Authoring Type

SA, SA4L

For matched entities, type of authoring operation:

OVERRIDE for override
DATA_ENTRY for data creation

BatchDate

Batch Date

SA, SAWE, SD, SDWE

Date of execution of the batch.

BatchID

Batch ID

AE, GD, GDWE, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SAWE, SD, SD4L, SDWE, SE, UM

ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected.

BatchSubmitter

Batch Submitter

SA, SAWE, SD, SDWE

User who submitted this batch.

ClassName

Class Name

AE, GD, GDWE, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM

Unique name of class / entity to which this record belongs

ConfidenceScore

Confidence Score

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM

Confidence Score of the golden record. It is the average of the match scores in the match group.

ConfirmationStatus

Confirmation Status

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM

Confirmation status for duplicate management:

Confirmed (CONFIRMED): Indicates that a master is confirmed in a golden or that a golden has all its masters confirmed.
Not-confirmed (NOT_CONFIRMED): Indicates that a master is not confirmed or that a golden is entirely made of unconfirmed masters.
Historically confirmed (WAS_CONFIRMED, for master records): If a master was confirmed into to a golden but this golden was fused into another golden.
Partially confirmed (PARTIALLY_CONFIRMED for golden records only): Indicates that a golden has part of his masters confirmed.

ConstraintName

Constraint Name

AE, GE, SE

For error records, name of the constraint causing this error.

ConstraintType

Constraint Type

AE, GE, SE

For error records, type of the constraint causing this error.

CreationDate

Created On

GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM

Creation date of a record

Creator

Created By

GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM

User who created the record

DeleteAuthor

Deleted By

GX, MX, SA, SA4L, SA4LK, SD, SD4L, SD4LK

User for deleted the record

DeleteDate

Deleted On

GX, MX, SA, SA4L, SA4LK, SD, SD4L, SD4LK

Deletion date of a record

DeleteOperation

Delete Operation

GX, MX, SA, SA4L, SA4LK, SD, SD4L, SD4LK

Delete operation ID.

DeletePath

Delete Path

GX, MX, SA, SA4L, SA4LK, SD, SD4L, SD4LK

Cascade path through which the record was reached during a delete. Null for record directly selected for deletion.

DeleteType

Delete Type

GX, MX

Delete Type (SOFT_DELETE or HARD_DELETE for golden and master deletion). The type of delete can be LEGLESS_DELETE for golden records deleted when they loose all their master records.

DupsCheckoutCause

Checkout Cause

UM

Cause that made the record part of the duplicate management transaction. Possible causes are:

User (USER): the record was checked out by the user.
Same Golden (SAME_GOLDEN): the record is part of the same matching group than another master that was checked out.
Same Suggestion (SAME_SUGG): the record is part of the same suggestion than another master that was checked out.
Same Exclusion Group (SAME_XGRP): the record is part of the same exclusion group than another master that was checked out.
Other (OTHER): Another reason.

DupsOperationID

Dups Operation ID

UM

Identifier for a duplicate management operation

ErrorStatus

Error Status

GD, GDWE, GH4B, GI, SA, SAWE, SD, SDWE

Error Status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are:

VALID if the record has no error
ERROR if the record has errors
RECYCLED if the record was recycled and considered valid
OBSOLETE_ERROR if the record had errors but a newer version of the record fixes them.
a <NULL> value also indicates a record with no error.

ExclusionGroupID

Exclusion Group

MD, MH, MH4B, MI, MX, UM

Exclusion Group ID. An exclusion group represents a group of records for which a user has taken split decisions.

FromBatchID

From Batch ID

GH, GH4B, MH, MH4B

Batch at which the history record was created.

GoldenType

Golden Type

GD, GDWE, GH4B

For fuzzy matching and ID matching entities, indicates whether the golden record was created and authored only in the MDM (DE_BASED) or was consolidated from publishers'' data and possibly overriden (MASTER_BASED)

HasOverride

Has Override

GD, GH4B

For fuzzy matching and ID matching entities, this flag (0 or 1) indicates whether the golden record has override values.

HasSuggestedMerge

Has Sugg. Merge

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM

Flag (0 or 1) indicating that match and merge suggestions are available for this record.

IsConfirmed

Is Confirmed

GD, GDWE, GH4B, GI

Flag (0 or 1) indicated whether this golden record has been confirmed (Fuzzy Matched entities only).

LoadID

Load ID

AE, SA, SA4L, SAWE, SD, SD4L, SDWE, SE, UM

Load Identifier used as the unique transaction ID for external application pushing data to the platform

MastersCount

Masters Count

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM

Number of master records contributing to the golden record.

MatchGroupID

Match Group ID

MD, MH, MH4B, MI, MX

ID of the match group for the master record. This column is set when matching takes place.

OldMatchGroupID

Old Match Group ID

MI

Previous identifier of the match group for the master record.

OriginalBatchID

Original Batch ID

SA, SA4L, SAWE, SD, SD4L, SDWE, UM

Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager.

OriginalConfidenceScore

Original Confidence Score

UM

Confidence Score of the original golden in a duplicate management operation.

OriginalConfirmationStatus

Original Confirmation Status

UM

Original Confirmation Status in a duplicate management operation.

OriginalExclusionGroupID

Original Exclusion Group

UM

Original Exclusion Group in a duplicate management operation.

OriginalMastersCount

Original Masters Count

UM

Number of master records in the original golden in a duplicate management operation.

PublisherID

Publisher ID

AE, GD, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM

For matching entities, code of the publisher that published the record.

SourceID

Source ID

AE, GD, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM

ID of the source record in the source publisher system (Fuzzy Matched entities only).

SuggestedMergeConfidenceScore

Sugg. Merge Confidence Score

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM

Confidence Score for the suggested match group.

SuggestedMergeID

Sugg. Merge ID

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM

ID of the merge suggested by the automated matching.

SuggestedMergeMastersCount

Sugg. Merge Masters Count

GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM

Number of master records in the suggested merge.

ToBatchID

To Batch ID

GH, GH4B, MH, MH4B

Batch at which history record stopped being current or null if the records is still current.

UpdateDate

Updated On

GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM

Update date of a record

Updator

Updated By

GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM

User who updated the record

ViewType

View Type

AE, GD, GDWE, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM

Returns the current view type for the record. e.g. GD, MD…

The following attributes are deprecated.

Name Label Views Description

BranchID

Branch ID

AE, GD, GDWE, GE, GI, MD, MH4B, MI, SD, SD4L, SD4LK, SDWE, SE

Branch ID to which this record belongs. This attribute is deprecated and always returns 0.

FromEdition

From Edition

GD, GDWE, MD, MH4B

Data Edition ID in which this record was created or last updated. This attribute is deprecated and always returns 0.

ToEdition

To Edition

GD, GDWE, MD, MH4B

Data Edition ID into which this record was deleted or closed. This attribute is deprecated and always returns null.

Built-in Lineage Navigation

Parent Navigation

The following table lists the built-in lineage navigation to a parent record.

Pseudo-Role Navigation Description Navigation Path

CurrentGoldenRecord

Current Golden Record

Current golden record a golden history record is attached to.

Basic entity:

GH to GD

Fuzzy matched entity:

GH to GD

ID matched entity:

GH to GD

CurrentMasterRecord

Current Master Record

Current master record a master history record is attached to.

Fuzzy matched entity:

MH to MD

ID matched entity:

MH to MD

GoldenRecord

Golden Record

Golden record into which the master, source or golden integration record consolidates.

Basic entity:

SA4L to GD
SA to GD

Fuzzy matched entity:

GI to GD
MD to GD
MH4B to GH4B
SA4L to GD
SA to GD

ID matched entity:

GI to GD
MD to GD
MH4B to GH4B
SA4L to GD
SA to GD
SD4L to GD
SD to GD

MasterRecord

Master Record

Master record corresponding to a source record.

Fuzzy matched entity:

MI to MD
SD4LK to MD
SD4L to MD
SD to MD

ID matched entity:

MI to MD
SD4L to MD
SD to MD

RecordWithError

Record with Error

Authoring record with error attached to a source authoring error.

Basic entity:

AE to SAWE

Fuzzy matched entity:

AE to SAWE

ID matched entity:

AE to SAWE

SourceAuthoringRecord

Source Authoring Record

Authoring record attached to a source authoring error.

Basic entity:

AE to SA

Fuzzy matched entity:

AE to SA

ID matched entity:

AE to SA

SourceRecord

Source Record

Source Record attached to the current error record.

Fuzzy matched entity:

SE to SD

ID matched entity:

SE to SD

Child Navigation

The following table lists the built-in lineage navigation to child records.

Pseudo-Role Navigation Description Navigation Path

Errors

Errors

Errors detected for a given source or golden record.

Basic entity:

SAWE to AE
SA to AE

Fuzzy matched entity:

GDWE to GE
GD to GE
SAWE to AE
SA to AE
SDWE to SE
SD to SE

ID matched entity:

GDWE to GE
GD to GE
SAWE to AE
SA to AE
SDWE to SE
SD to SE

GoldenHistoryRecords

Golden History Records

Basic entity:

GD to GH

Fuzzy matched entity:

GD to GH

ID matched entity:

GD to GH

IntegrationMasterRecords

Integration Master Records

Master integration record attached to the golden integration record. Available in post-consolidation enrichers and validations.

Fuzzy matched entity:

GI to MI

ID matched entity:

GI to MI

MasterHistoryRecords

Master History Records

Fuzzy matched entity:

MD to MH

ID matched entity:

MD to MH

MasterRecords

Master Records

Master Records consolidated in the golden record.

Fuzzy matched entity:

GD to MD
GH4B to MH4B

ID matched entity:

GD to MD
GH4B to MH4B

SourceAuthoringRecords

Source Authoring Records

Source data authored by users attached to a golden record.

Basic entity:

GD to SA

Fuzzy matched entity:

GD to SA

ID matched entity:

GD to SA

SourceRecords

Source Records

Source Records attached to the current golden record (for ID matching) or master record.

Basic entity:

GD to SA

Fuzzy matched entity:

MD to SD

ID matched entity:

GD to SD
MD to SD

Version 4.4 Rev 1
Last updated 2018-06-14 11:52:52 UTC