Semarchy xDM plug-in reference guide

This article helps you find some useful content using the knowledge base full search. It is a copy of the official's guide.

Please go to the official Semarchy xDM documentation for more self-investigation on your problem.



























Welcome to Semarchy xDM.
This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.

PREFACE

Audience

This document is intended for integration architects and developers setting up an MDM hub as part of their enterprise integration architecture.


If you want to learn about MDM or discover Semarchy xDM, you can watch our tutorials.

The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

ConventionMeaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

 

Overview

This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.
Using this guide, you will learn how to use these plug-ins in your MDM projects.

INTRODUCTION

What is Semarchy xDM?

Semarchy xDM is designed to support any kind of Enterprise Master Data Management initiative. It brings an extreme flexibility for defining and implementing master data models and releasing them to production. The platform can be used as the target deployment point for all master data of your enterprise or in conjunction with existing data hubs to contribute to data transparency and quality with federated governance processes. Its powerful and intuitive environment covers all use cases for setting up a successful master data governance strategy.

Semarchy xDM is based on a coherent set of features for all Master Data Management projects.

Semarchy xDM Plug-ins Architecture

Semarchy xDM implements plug-ins that use external services or information systems to contribute to the master data processing and enrichment.

Plug-ins are used in Semarchy xDM in:

  • Enrichers: By adding new enrichers, you can perform record-level enrichment to update, augment or standardize existing attribute values, or create content in new attributes. For example, you can connect to an external web service to retrieve stock ticker symbols from company names.

  • Validations: By adding new validations, you can perform record-level checks, that is check the value of attributes in a record against complex rules. For example, you can connect to an external provider to check whether a billing or shipping address is valid or not.

INFO: Using Plug-ins is explained in the Semarchy xDM Developer’s Guide, in the Certification Process Design chapter. Installing plug-insto your Semarchy xDM instance is explained in the Semarchy xDM Administration Guide, in the Managing the Platform chapter.


The plug-ins are designed using the Open Plug-In Architecture. Plug-in design is covered in the Semarchy xDM Plug-in Development Guide.

TEXT NORMALIZATION AND TRANSLITERATION

This plug-in applies normalization, transliteration and phonetic transformations to text strings.

Semarchy Text Enricher

Plug-in ID

Semarchy Text Enricher - com.semarchy.engine.plugins.convergence.text

Description

This enricher applies normalization, transliteration and phonetic transformations to text strings. It takes an Input Text and applies anInput Filter to this text, for example to remove all characters but letters. Then it applies a series of transformations defined in theTransformation parameter and returns a Transformed Text.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Input Filter

No

String

Filter applied to the input text before the transformation. Valid values for the Filter are: NONE, which applies no filter, LETTERS, which removes all non-letter characters from the input string and STANDARD, which tokenizes the input text by splitting words.

Transformation

Yes

String

A pipe-separated sequence of transformation definitions. Transformations include:

  • NORMALIZE

  • TRANSLITERATE [<Id>]

  • PHONETIC <Type> [<MaxCodeLengh>]

  • BEIDERMORSE [Split] [RuleType] [MaxPhonemes] [NameType]

  • DOUBLEMETAPHONE [<max_code_length>] [split].

See the Transformations section for a detailed description of each transformation.

Synonyms Separator

No

String

Separator used between the synonyms returned by the enricher. Default value is a pipe (|).

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Text

Yes

String

Text to transform.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Transformed Text

String

Filtered and transformed text.

Secondary Transformed Text

String

Secondary transformed text. This text may contain transformation resulting from a Beidermorse or Double Metaphone transformation. See Other Transformations for more information.

Input Filters

The following input filters are supported by the enricher:

  • NONE: No filter is applied to the input text.

  • LETTERS: This transformation removes all non-letter characters from the input string.

  • STANDARD: Breaks words in the input text according to the rules from the Unicode Text Segmentation algorithm, as specified inUnicode Standard Annex #29.

Transformations

The following transformations definitions are supported by the enricher:

  • Normalization

  • Phonetic Transformation

    • PHONETIC [SOUNDEX | REFINEDSOUNDEX | METAPHONE [<max_code_length>] | DOUBLEMETAPHONE [<max_code_length>] | CAVERPHONE | CAVERPHONE1 | NYSIIS | MRA | COLOGNE | BEIDERMORSE ]: applies Phonetic Transformations

  • Other Transformations

    • BEIDERMORSE [Split] [RuleType] [MaxPhonems] [NameType]

    • DOUBLEMETAPHONE [<max_code_length>] [split]

  • Transliteration

    • TRANSLITERATE [<ID>] apply a Transliteration transformation to the string. The transliteration is identified by an ID. If not ID is provided, the Any-Latin transliteration is used.

It is possible to sequence transformations. Successive transformations are separated by a pipe | sign.
Examples of transformations:

  • Normalize and apply Phonetic Soundex: NORMALIZE | SOUNDEX

  • Normalize and then transliterate to Latin script: NORMALIZE | TRANSLITERATE Any-Latin

  • Normalize, transliterate to Latin script and then apply Metaphone with a maximum resulting length of 5 characters: NORMALIZE | TRANSLITERATE Any-Latin | PHONETIC METAPHONE 5

  • Perform a BEIDERMORSE transformation for family names with an approximate transformation on generic name types:BEIDERMORSE APPROX 10 FALSE GENERIC

Normalization

The NORMALIZE transformation normalizes the string by applying a series of transformations, which map similar characters to a common target, to ignore certain distinctions between similar characters. This includes accent removal, case folding, etc.

Example of transformations:

Original TextNormalized TextComments

‒ – — ―

- - - -

4 different dashes converted to 4 similar dashes.

AbSoLuteLy TRUE

absolutely true

CaseFolding

…​

...

convert [dotdotdot] to [dot dot dot]

½ Tsp

1/2 tsp

Symbol folding

Æsop

aesop


Äsop

asop


Dürst

durst


Encyclopædia

encyclopaedia


œuvre

oeuvre


poſt

post


résumé français

resume francais

Accent removal and case folding

Straße

strasse


٣ is a magic number

3 is a magic number

Native Digital folding

The complete list of transformations is given below:

Accent removal

Hebrew Alternates folding

Overline folding

Suzhou Numeral folding

Case folding

Jamo folding

Positional forms folding

Symbol folding

Canonical duplicates folding

Letterforms folding

Small forms folding

Underline folding

Dashes folding

Math symbol folding

Space folding

Vertical forms folding

Diacritic removal (including stroke, hook, descender)

Multigraph Expansions: All

Spacing Accents folding

Width folding

Greek letterforms folding

Native digit folding

Subscript folding

Han Radical folding

For more information about these transformations see the UTR#30 Characters Foldings transformation.

Phonetic Transformations

A phonetic transformation applied to the string transforms it to a string corresponding to its pronunciation. The default phonetic transformation is PHONETIC METAPHONE.

Phonetic transformations include:

  • PHONETIC SOUNDEX and PHONETIC REFINEDSOUNDEX: Phonetic algorithms for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. More information about Soundex

  • PHONETIC METAPHONE and PHONETIC DOUBLEMETAPHONE are algorithms for indexing words by their English pronunciation. They are suitable for use with most English words, not just names. Double Metaphone can return both a primary and a secondary code for an input string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. These algorithms support a Max Code Length parameter which defines the maximum length of the encoded result. This value default to 4. More Details about Metaphone.

  • PHONETIC CAVERPHONE and PHONETIC CAVERPHONE1. Algorithm for data matching for electoral rolls, optimized for accents present in parts of New Zealand. More Details about Caverphone and Caverphone 1

  • PHONETIC NYSIIS. New York State Identification and Intelligence System (NYSIIS), which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. More Details about NYSIIS

  • PHONETIC MRA: Match Rating Approach developed by Western Airlines - this algorithm has an encoding and range comparison technique. More Details about MRA

  • PHONETIC COLOGNE Phonetic algorithm optimized for the German language. See Kölner Phonetik

  • PHONETIC BEIDERMORSE is a phonetic algorithm supporting greater accuracy in matching Slavic and Yiddish surnames with similar pronunciation but differences in spelling. It returns a list of tokens (separated by the string specified in the Synonyms Separator parameter.): first the transformed input text, then the transformed synonyms of the input text. More information aboutBeidermorse.

Other Transformations

These other transformations return a list of tokens which can be split into the Transformed Text and Secondary Transformed Textoutputs.


These transformations should be preferably used at the end of the transformation sequence, as their secondary transformed text is not processed in subsequent transformations in the sequence.

Other transformations include:

  • BEIDERMORSE [<split>] [<rule_type>] [<max_phonems>] [<name_type>] The Beidermorse transformation returns a list of tokens: first the transformed input text, then the transformed synonyms of the input text. Beidermorse supports the following parameters:

    • split. If this parameter is set to true all synonyms after the first one are concatenated in the Secondary Transformed Text output. If this parameter is set to false (default value) all synonyms are appended to the first token in the Transformed Text output.

    • rule_type is EXACT for exact or APPROX for approximate phonetic transformation.

    • max_phonems is the maximum number of synonyms returned. Default is 20.

    • name_type default value is GENERIC. Use ASHKENAZI or SEPHARDIC if you specifically want phonetic encodings optimized for Ashkenazi or Sephardic Jewish family names.

  • DOUBLEMETAPHONE [<max_code_length>] [<split>]. This transformation encodes the input string with the Double Metaphone algorithm and returns a primary code and a secondary code. If split is set to true, then the secondary code is pushed to the Secondary Transformed Text output. Otherwise, it is concatenated to the primary code in the Transformed Text output.

Transliteration

The TRANSLITERATE transformation transforms a text from one character script to another. For example, Traditional to Simplified Chinese, Japanese Hiragana to Katakana, Cyrillic to Latin script.
Each source/target transliteration is identified by an ID. The list of supported transliteration IDs is provided in the list below. If no ID is provided, the Any-Latin transliteration is used.

Each ID represents a transliteration from one script/language to another. For example: Katakana-Latin, Latin-thai, etc. The special tag any stands for any script/language. For example, Any-Latin converts any input script to Latin script.

Accents-Any

Any-Name

Devanagari-Bengali

Han-Latin

Latin-Greek

Pinyin-NumericPinyin

Amharic-Latin/BGN

Any-NFC

Devanagari-Gujarati

Han-Latin/Names

Latin-Greek/UNGEGN

pl_FONIPA-ja

Any-Accents

Any-NFD

Devanagari-Gurmukhi

Hangul-Latin

Latin-Gujarati

pl-ja

Any-am

Any-NFKC

Devanagari-Kannada

Hans-Hant

Latin-Gurmukhi

pl-pl_FONIPA

Any-Arabic

Any-NFKD

Devanagari-Latin

Hant-Hans

Latin-Han

Publishing-Any

Any-Armenian

Any-Null

Devanagari-Malayalam

Hebrew-Latin

Latin-Hangul

ro_FONIPA-ja

Any-Bengali

Any-Oriya

Devanagari-Oriya

Hebrew-Latin/BGN

Latin-Hebrew

ro-ja

Any-Bopomofo

Any-pl_FONIPA

Devanagari-Tamil

Hex-Any

Latin-Hiragana

ro-ro_FONIPA

Any-CaseFold

Any-Publishing

Devanagari-Telugu

Hex-Any/C

Latin-Jamo

ru-ja

Any-cs_FONIPA

Any-Remove

Digit-Tone

Hex-Any/Java

Latin-Kannada

ru-zh

Any-Cyrillic

Any-ro_FONIPA

es_419-ja

Hex-Any/Perl

Latin-Katakana

Russian-Latin/BGN

Any-Devanagari

Any-ru

es_419-zh

Hex-Any/Unicode

Latin-Malayalam

Serbian-Latin/BGN

Any-es_419_FONIPA

Any-sk_FONIPA

es_FONIPA-am

Hex-Any/XML

Latin-NumericPinyin

Simplified-Traditional

Any-es_FONIPA

Any-Syriac

es_FONIPA-es_419_FONIPA

Hex-Any/XML10

Latin-Oriya

sk_FONIPA-ja

Any-FCC

Any-Tamil

es_FONIPA-ja

Hiragana-Katakana

Latin-Syriac

sk-ja

Any-FCD

Any-Telugu

es_FONIPA-zh

Hiragana-Latin

Latin-Tamil

sk-sk_FONIPA

Any-Georgian

Any-Thaana

es-am

IPA-XSampa

Latin-Telugu

Syriac-Latin

Any-Greek

Any-Thai

es-es_FONIPA

it-am

Latin-Thaana

Tamil-Bengali

Any-Greek/UNGEGN

Any-Title

es-ja

it-ja

Latin-Thai

Tamil-Devanagari

Any-Gujarati

Any-Upper

es-zh

ja_Latn-ko

Macedonian-Latin/BGN

Tamil-Gujarati

Any-Gurmukhi

Any-zh

Fullwidth-Halfwidth

ja_Latn-ru

Malayalam-Bengali

Tamil-Gurmukhi

Any-Han

Arabic-Latin

Georgian-Latin

Jamo-Latin

Malayalam-Devanagari

Tamil-Kannada

Any-Hangul

Arabic-Latin/BGN

Georgian-Latin/BGN

JapaneseKana-Latin/BGN

Malayalam-Gujarati

Tamil-Latin

Any-Hans

Armenian-Latin

Greek-Latin

Kannada-Bengali

Malayalam-Gurmukhi

Tamil-Malayalam

Any-Hant

Armenian-Latin/BGN

Greek-Latin/BGN

Kannada-Devanagari

Malayalam-Kannada

Tamil-Oriya

Any-Hebrew

ASCII-Latin

Greek-Latin/UNGEGN

Kannada-Gujarati

Malayalam-Latin

Tamil-Telugu

Any-Hex

Azerbaijani-Latin/BGN

Gujarati-Bengali

Kannada-Gurmukhi

Malayalam-Oriya

Telugu-Bengali

Any-Hex/C

Belarusian-Latin/BGN

Gujarati-Devanagari

Kannada-Latin

Malayalam-Tamil

Telugu-Devanagari

Any-Hex/Java

Bengali-Devanagari

Gujarati-Gurmukhi

Kannada-Malayalam

Malayalam-Telugu

Telugu-Gujarati

Any-Hex/Perl

Bengali-Gujarati

Gujarati-Kannada

Kannada-Oriya

Maldivian-Latin/BGN

Telugu-Gurmukhi

Any-Hex/Plain

Bengali-Gurmukhi

Gujarati-Latin

Kannada-Tamil

Mongolian-Latin/BGN

Telugu-Kannada

Any-Hex/Unicode

Bengali-Kannada

Gujarati-Malayalam

Kannada-Telugu

Name-Any

Telugu-Latin

Any-Hex/XML

Bengali-Latin

Gujarati-Oriya

Katakana-Hiragana

NumericPinyin-Latin

Telugu-Malayalam

Any-Hex/XML10

Bengali-Malayalam

Gujarati-Tamil

Katakana-Latin

NumericPinyin-Pinyin

Telugu-Oriya

Any-Hiragana

Bengali-Oriya

Gujarati-Telugu

Kazakh-Latin/BGN

Oriya-Bengali

Telugu-Tamil

Any-ja

Bengali-Tamil

Gurmukhi-Bengali

Kirghiz-Latin/BGN

Oriya-Devanagari

Thaana-Latin

Any-Kannada

Bengali-Telugu

Gurmukhi-Devanagari

Korean-Latin/BGN

Oriya-Gujarati

Thai-Latin

Any-Katakana

Bopomofo-Latin

Gurmukhi-Gujarati

Latin-Arabic

Oriya-Gurmukhi

Tone-Digit

Any-ko

Bulgarian-Latin/BGN

Gurmukhi-Kannada

Latin-Armenian

Oriya-Kannada

Traditional-Simplified

Any-Latin (default)

cs_FONIPA-ja

Gurmukhi-Latin

Latin-ASCII

Oriya-Latin

Turkmen-Latin/BGN

Any-Latin/BGN

cs_FONIPA-ko

Gurmukhi-Malayalam

Latin-Bengali

Oriya-Malayalam

Ukrainian-Latin/BGN

Any-Latin/Names

cs-cs_FONIPA

Gurmukhi-Oriya

Latin-Bopomofo

Oriya-Tamil

Uzbek-Latin/BGN

Any-Latin/UNGEGN

cs-ja

Gurmukhi-Tamil

Latin-Cyrillic

Oriya-Telugu

XSampa-IPA

Any-Lower

cs-ko

Gurmukhi-Telugu

Latin-Devanagari

Pashto-Latin/BGN

zh_Latn_PINYIN-ru

Any-Malayalam

Cyrillic-Latin

Halfwidth-Fullwidth

Latin-Georgian

Persian-Latin/BGN


LOOKUP

This plug-in performs a data lookup on a mapping table.

Semarchy Lookup Enricher

Plug-in ID

Semarchy Lookup Enricher - com.semarchy.engine.plugins.convergence.text

Description

This enricher performs a data lookup on a mapping table accessed via a JDBC datasource.

The mapping table is located in a datasource provided using the Datasource parameter, which defaults to the data location’s datasource. The mapping table is declared to the enricher:

  • By giving a Mapping Table as well as a Lookup Column and an Output Column from this table. The input lookup value is searched in the Lookup Column and the corresponding value from the Output Column is returned.

  • By giving a Custom SQL select statement in the form: select <lookup_column> LOOKUP_COLUMN, <output_column> OUTPUT_COLUMN from <mapping_table> where …​. This statement is executed on the datasource, and must return two columns aliased LOOKUP_COLUMN and OUTPUT_COLUMN. These columns will be used as the lookup and output columns.


You must either set Mapping Table, Lookup Column and Output Column, or only set Custom SQL. The Mapping Table, Lookup Column, and Output Column parameters are mandatory unless the Custom SQL parameter is set instead.

The lookup is performed on the mapping table with an optional memory cache configured with the Cache Lookup Data parameter.

When a null value is passed as the Lookup Value or when the lookup finds no matching value in lookup column, the enricher returns the Fallback Value or the Lookup Value, depending on the Fallback Behavior parameter.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Cache Lookup Data

No

String

Use this parameter to optionally use a memory cache for the lookup process. Possible values are:

  • NO_CACHE: Do not use a cache, the mapping table is queried for each lookup.

  • LOAD_ON_START (Default): Cache all lookup data in memory at initialization. All lookups are made using the memory cache.

  • LOAD_ON_DEMAND : Cache data after it is looked for. Lookups are first attempted on the memory cache, then on the mapping table if the lookup value is not present in the cache.

Custom SQL

No

String

Leave this parameter empty to use a generated SQL query. Use this parameter instead of Mapping Table, Lookup Column and Output Column to define the lookup dataset with a select statement in the following form:

select
	<lookup_column> LOOKUP_COLUMN,
	<output_column> OUTPUT_COLUMN
from <mapping_table>
where ...

Note that this query must return a dataset with two columns aliased LOOKUP_COLUMN and OUTPUT_COLUMN. These columns are used instead of the Lookup Column and Output Column.

Datasource

No

String

JNDI name of datasource containing the lookup data. If this parameter is not defined, the enricher uses the data location datasource. Note that this parameter should contain the full path of the datasource, for example: java:comp/env/jdbc/SEMARCHY_STAGING.

Fallback Behavior

No

String

Behavior when the lookup value is not found in the lookup column. Possible values are:

  • USE_FALLBACK (default): returns the fallback value or null if the fallback value is not specified

  • USE_LOOKUP_VALUE: returns the lookup value.

Fallback Value

No

String

Value to return if the lookup value is not found in the lookup column. Default value: NULL.

Lookup Column

No

String

Physical name of the column containing the lookup values. Default value: NONE.

Mapping Table

No

String

Physical name of the mapping table containing the lookup and output columns. Default value: NONE.

Output Column

No

String

Physical name of the column containing the values returned by the enricher. Default value: NONE.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Lookup Value

Yes

String

Value to look for in the mapping table’s lookup column.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Output Value

String

Value returned by the lookup.

TRANSLATION

Google Translate Enricher

Plug-in ID

Google Translate Enricher - com.semarchy.engine.plugins.convergence.translate.v2

Description

This enricher translates an Input Text from a Source Language to a Target Language using the Google Translate service. The source language is automatically detected if unspecified. This enricher requires a valid Google Key.


This plug-in must be used in compliance with the Google Translate APIs Terms of Service.

This enricher uses the Google Translate Service, which must be accessible from the Semarchy xDM Application at the following URL: https://www.googleapis.com/language/translate/v2?<parameters>;. Make sure to make this URL accessible through your firewalls.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Application Name

Yes

String

Name of the client application accessing the Google Translate service. Application names should preferably have the format <company-id>_<app-name>_<app-version>. The name will be used by the Google servers to monitor the source of authentication.

Google Key

Yes

String

Google API Key. It is a unique key that you generate using the Google API Console.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Text

Yes

String

Text to translate.

Source Language

No

String

Language of the input text. If it is unspecified, it is detected from the input text.

Target Language

Yes

String

Target language for the translation.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Translated Text

String

Translated Text.

NAME PROCESSING

Semarchy Person Name Enricher

Plug-in ID

Semarchy Person Name Enricher - com.semarchy.engine.plugins.convergence.personname.PersonNameEnricher

Description

This enricher extracts from a person’s full name his/her Given Name, Surname and Gender. It parses the Input Name and identifies a Given Name and Surname (with a Name Parsing Score confidence percentage). Then the given name is searched in a database of names for the source country code provided in the input. It a given name is matched, a Gender and a Most Frequent Gender (if the given name is unisex) are returned.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Surname Position

Yes

String

Position of the Surname. This parameter is used for parsing the input name to detect the first and last names, and for generating the Full Name output. Possible values (SURNAME_LAST ,SURNAME_FIRST )

Case Transformation

Yes

String

Case transformation for the name. Possible values: NONE, UPPER_CASE, LOWER_CASE and CAMEL_CASE.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Name

Yes

String

Person full name to enrich.

Source Country Code

Yes

String

Code of the country of origin for the name.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Full Name

String

The reconstructed full name, with the surname positioned according to the Surname Position parameter.

Gender

String

The gender of the Matched Given Name. One of MALE, FEMALE, UNISEX, UNKNOWN.

Gender Score

String

Confidence with which for Most Frequent Gender can be used [0-100].

Given Name

String

The part identified as Given Name in the input name.

Matched Given Name

String

Given name matched in the given name database.

Most Frequent Gender

String

The more frequent gender of the Matched Given Name for the given country. One of MALE, FEMALE, UNKNOWN.

Names Parsing Score

String

Names Parsing confidence [0-100]

Surname

String

The part identified as Surname in the input name.

Surname Position

String

Position at which the surname was detected.

INTERNATIONAL PHONE NUMBERS PLUG-IN

The International Phone Numbers Plug-In for Semarchy xDM provides two features:

  • An enricher to standardize and improve phone numbers formatting.

  • A validator to check the validity of phone numbers.

Semarchy Phone Enricher

Plug-in ID

Semarchy Phone Enricher - com.semarchy.engine.plugins.convergence.phone

Description

This enricher takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Region Code. It returns a standardized Enriched Phone Number in the Enriched Phone Format. Geocoding Data is also returned and includes (depending on the country) the country, the region/state and the city name.

If a phone number is not valid, the enricher returns the original phone value in the Enriched Phone Number, a Status Code as well as a Status Text describing the issue with the input phone number.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

This plug-in does not use any parameter.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Phone Number

Yes

String

Input Phone Number.

Region Code

No

String

Two letters region code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code.

Enriched Phone Format

No

String

Format of the Enriched Phone Number. Possible values are INTERNATIONAL (default), NATIONAL , E164 and RFC3966. See Phone Formats for more information.

Region of Origin

No

String

Formats the phone output for international dialing from the country or region provided in this input. E.g.: US, FR, GB, DE. Use ZZ for unknown region. See this link for the list of codes.

Phone Formats

The following standards are supported to format the enriched phone number:

Phone Format Examples:

  • E.123 - National Notation: (042) 123 4594

  • E.123 - International Notation: +31 42 123 4567

  • E.164 - International Notation: +31421234567 (equivalent to E.123 with no formatting)

  • RFC3966 - International Notation: +31-42-123-4567 (equivalent to E.123 with hyphens instead of spaces)

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Enriched Phone Number

String

Phone number returned by the enricher in the format specified in the Enriched Phone Format input. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues.

Geocoding Data

String

Geocoding data computed for a given number and country. Depending on the country and phone number, this value includes the country, region/state and city information. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues.

Status Code

String

Return code for the phone number processing. More details about the Status Codes.

Status Text

String

Text explaining the status code.

International Phone Prefix

String

International Phone Prefix for worldwide dialing.

National Number

String

National number part of a phone number in International format. It is often the International number without the Country Prefix.

Extension

String

Extension part of the phone number.

Country Code Source

String

Explains how the Country Code was retrieved. Possible values are FROM_NUMBER_WITH_PLUS_SIGN, FROM_NUMBER_WITH_IDD, FROM_NUMBER_WITHOUT_PLUS_SIGN and FROM_DEFAULT_COUNTRY.

Leading Zero

String

Returns 0 or 1 to specify if leading zero is mandatory for foreign calls.

Possible Phone Number

String

Returns 0 or 1 to indicate whether a phone number is a possible number, and the region where the number could be dialed from.

Possible Phone Number Reason

String

Detailed explanation of why a phone number is a possible number or not. Possible values are INVALID_COUNTRY_CODE, IS_POSSIBLE, TOO_LONG and TOO_SHORT.

Valid Phone Number

String

Returns 0 or 1 to indicate whether a phone number matches a valid pattern.

Valid Phone Number For Region

String

Returns 0 or 1 to indicate that a phone number is valid for the specified Region Code.

Phone Line Type

String

Provides the line type of a phone number. Possible values are : FIXED_LINE, FIXED_LINE_OR_MOBILE, MOBILE, PAGER, PERSONAL_NUMBER, PREMIUM_RATE, SHARED_COST, TOLL_FREE, UAN, UNKNOWN and VOIP

Region Code

String

Returns the region code for the Phone Number. See this link for the list of codes.

International Phone Number

String

Phone number formatted for international dialing.

Time Zones

String

List of corresponding time zones for a given number. For example: Europe/Paris. If the timezone is unknown, returns Etc/Unknown

First Time Zone

String

First time zone from the list of corresponding time zones for a given number.

Carrier Name

String

Name of the carrier for the phone number.

Status Codes

The following status codes are returned by the enricher:

  • 0 - OK: Optimal execution. No error detected.

  • 1 - INPUT_WAS_NULL: Input phone number was not set.

  • 2 - PARSING FAILED: The string supplied did not seem to be a phone number. Review the Status text for more information.

Semarchy Phone Extractor

Plug-in ID

Semarchy Phone Extractor - com.semarchy.engine.plugins.convergence.phone.extractor

Description

This enricher extracts a list of phone numbers from an Input Text and returns them as a Phone List, in a given Extraction Format.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Matching Leniency

No

String

Defines the phone number extraction leniency. Possible values are POSSIBLE (default), VALID_FOR_REGION (according to the Accepted Region) and VALID.

Extraction Format

No

String

Format of the extracted phone numbers. Possible values are RAW (default), INTERNATIONAL , NATIONAL , E164 and RFC3966 .

List Separator

No

String

Define the separator character used in the extracted phones list.

Maximum Invalid Numbers

No

String

Maximum number of invalid numbers allowed before stopping to process the text. This is to cover cases where the text contains a lot of false positives.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Text

Yes

String

Input text to search for phone numbers.

Accepted Region

No

String

Defines the region used when Matching Leniency is set to VALID_FOR_REGION.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Extracted Phone List

String

List of phone numbers extracted.

Phone 1 to Phone 5

String

First, second… extracted phone number in the list.

Semarchy Phone Validator

Plug-in ID

Semarchy Phone Validator - com.semarchy.engine.plugins.convergence.phone

Description

This validator takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Country Code. The validator checks whether this phone number is a valid international or national phone number.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Validation Leniency

No

String

Precise validation leniency for possible phone numbers. Value may be VALID (default), POSSIBLE or VALID_FOR_REGION.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Phone Number

Yes

String

Input Phone Number.

Country Code

No

String

Two letters country code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code.

EMAIL PLUG-IN

The Email Plug-In for Semarchy xDM provides an enricher to improve the quality of email addresses and a validator to check email validity.

Semarchy Email Enricher

Plug-in ID

Semarchy Email Enricher - com.semarchy.engine.plugins.convergence.email

Description

This enricher takes an Input Email Address and splits this address into the local-part (user name) and the domain name. Both these parts are checked syntactically and syntax errors are fixed automatically. The domain name validity is also checked using MX records lookup. The plug-in uses a Domain Name Cache for faster checks and automated fixes on domain names.


This plug-in is thread-safe and supports parallel execution.

Domain Name Cache

The plug-in uses several mechanisms for faster checks and automated fixes on domain names:

  • Domain names already checked as valid (MX record lookup) are persisted in a domain name cache stored in a JDBC Datasource. This avoids repeating MX lookup.

  • A list of known domains (e.g.: hotmail.com, gmail.com, etc.) is automatically seeded in the host name validation cache.

  • Common domain mistakes are fixed using a seeded replace list. For example gmai.com is automatically fixed to gmail.comusing the cache.

  • Invalid domains are automatically fixed to similar valid domains already present in the cache. For example, semarcyh.com is fixed to semarchy.com as semarchy.com was previously checked as a valid domain name.

See Appendix A: Semarchy Email Enricher Domain Name Cache for more information about the domain name cache.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Datasource

No

String

Full name of the JDBC Datasource used to store the host name validation cache.
If no datasource is specified then the data location’s datasource is used. For example: java:comp/env/jdbc/email_cache.

Lowercase User Name

No

String

Set to `1' to transform the local-part (username) to lowercase in the cleansed email address.

Offline Mode

No

String

Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup.

Processing Mode

No

String

Processing mode: DATABASE (default) or MEMORY. Memory mode is faster but requires more memory as it caches entirely the host name validation cache in memory.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Email Address

Yes

String

Input email address to cleanse.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Cleansed Email Address

String

Cleansed email address returned by the enricher. This address may be valid or not. The syntactic validity or domain name validity of the email address is indicated in the other plug-in outputs.

Valid Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the cleansed email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid or invalid. It returns null if the domain name does not exist in the cache and the MX Lookup was not issued.

Valid Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the cleansed email address.

Valid Email Syntax

String

Flag (0 or 1) indicating whether the cleansed email address is syntactically valid or not.

Valid Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the cleansed email address.

Valid Input Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the input email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid of invalid. It returns null if the domain name does not exist in the cache and the MX Lookup was not issued.

Valid Input Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the input email address.

Valid Input Email Syntax

String

Flag (0 or 1) indicating whether the input email address is syntactically valid or not.

Valid Input Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the input email address.

Semarchy Email Validator

Plug-in ID

Semarchy Email Validator - com.semarchy.engine.plugins.convergence.email

Description

This enricher takes an Input Email Address and checks its syntactic validity. The domain name validity is optionally also checked using MX records lookup.

The plug-in uses the same mechanisms as the Semarchy Email Enricher for checking the email validity, except that it does not modify the incoming email.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Accepted Domains

No

String

Value tolerated for the email domain. Possible values:

  • ALL_DOMAINS accepts all syntactically valid domains.

  • VALID_DOMAINS accepts only domain that are known to be valid (found in the locale cache as being valid or for which the MX lookup was successful).

  • VALID_AND_UNKNOWN is used in Offline Mode to accept/reject records based on their status (valid/invalid) found in the local cache. Unknown domains (not found in the local cache) are accepted.
    Syntax checking is always done and an email with an invalid syntax will always be rejected.

Offline Mode

No

String

Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup.

Processing Mode

No

String

Processing mode: DATABASE (default) or MEMORY. Memory mode is faster but requires more memory as it caches entirely the host name validation cache in memory.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Input Email Address

Yes

String

Input email address to check.

GBGROUP MATCHCODE GLOBAL PLUG-IN

The Matchcode Global Plug-in for Semarchy xDM uses GBGRoup Matchcode Global to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal addresses with geocoding and timezone information.

Matchcode Global Enricher

Plug-in ID

Matchcode Global Enricher - com.semarchy.engine.plugins.convergence.address

Description

This enricher takes an input address, enriches and validates this postal address.


This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

On-Premise Host

No

String

Host name or IP address of the Pool Manager server for an on-premise installation.

On-Premise Port

No

String

Port of the Pool Manager service for an on-premise installation. The default value is 27920.

On-Demand URL

No

String

URL of the on-demand service. To use this service, the ON_DEMAND pool must be specified in the Pool Names List in the plug-in parameter or plug-in input.

Data Elements Format

No

String

Format used for the data elements returned by the enricher. Possible values are UPPERCASE, TITLECASE, LOWERCASE, NONE (default).

Pool Names List

No

String

Comma-separated list of pool names to query. Use the ON_DEMAND pool name to explicitly use the on-demand service. The Pool Names List specified as a plug-in input overrides this value for specific records.

Pools and On-Demand/On-Premise Configuration

A pool represents a set of databases to search addresses in an on-premise setup of Matchcode Global. Pools (identified by their Pool Name) are defined and managed by the Pool Manager server. The plug-in connects to this server using the On-Premise Host and On-Premise Port parameters and queries the pools specified in the Pool Names List.


For more information about pools configuration and the pool manager, see the Capscan Pool Manager Documentation provided with your installation of Matchcode Global.

When performing an address query, the plug-in uses the Pool Names List (either provided as an input or parameter). The query is launched on each pool in the list until a pool is able to process the address.

In the Pool Names List, a specific pool called ON_DEMAND allows switching to on-demand processing. When this pool name appears in the list, the On-Demand URL is used to query the on-demand service. If ON_DEMAND only appears in the pool names list, the On-Premise Host and On-Premise Port parameters are unused.


When configured to use the On-Demand service, this enricher uses a geocoding server which must be accessible from the Semarchy xDM Application at the URL specified in the On Demand URL parameter. Make sure to make this URL accessible through your firewalls.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.

Pool Names List

No

String

Comma-separated list of pool names to query. Use the ON_DEMAND pool name to explicitly use the on-demand service. This list overrides the Pool Names List plug-in parameter for this record only.


The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Formatted Address
These outputs contain address information formatted and suitable for mailing purposes.

Address

String

Comma-separated list of address lines. This output contains the full formatted address.

Address Key

String

UK Address Key as defined by the Royal Mail.

Ambiguity
These outputs contain information for ambiguous matches.

Ambiguity List

String

Comma-separated list of address elements and postal codes of the form: Address Elements;Postal Code. This list is provided if the search result is ambiguous.

Ambiguity List Count

Integer

Count of entries in the ambiguity list.

Address Items
These outputs contain address tokens.

Organization

String

Organization Name.

Building Name

String

Name of the building.

Building Number

String

Number of the building.

Sub-Building

String

Sub-building information. Postal boxes (PO Box) information appear in this field.

Street

String

Street of the address.

Dependent Street

String

Street to which this address’ street depends to.

Locality

String

Locality of the address.

Dependent Locality

String

Locality to which this address’ locality depends to.

County

String

Name of the county or province.

Postal Code

String

Postal code.

Postal Town

String

Town or City.

Country

String

Country of the address.

Country Code

String

Country code.

Error Management
These outputs contain result and error codes for the processing of the address.

Result Code

Integer

Result code for the address search. See Error Management Outputs for more information.

Error Code

String

Error code returned by the server.

Error Text

String

Error message returned by the server.

Address Quality
These outputs contain information about the quality of the output address and match process.

Field Status

String

8 character string. Each character represents how each address element was matched. See Address Quality Outputs for more information.

Match Score

Integer

Percentage score describing the quality of the address match.

Match Level

Integer

Address element to which the address is matched. See Address Quality Outputs for more information.

Output Status

String

This output field contains the status of the address match; Whether Verified, Corrected, Parsed or Not Matched. See Address Quality Outputsfor more information.

Postal Code Change Level

String

The level at which the matched postal code differs from the input postal code. See Address Quality Outputs for more information.

Input Postal Code Level

Integer

Level of postal code input: 0 - No post code, 4 - Postal code.

Output Postal Code Level

Integer

Level of postal code match: 0 - No post code, 4 - Postal code.

Geocoding
These outputs contain information about address geocoding.

Latitude

Float

GPS (WGS84) latitude in degrees decimal

Longitude

Float

GPS (WGS84) longitude in degrees decimal

Geocoding Level

Integer

Geocoding level for this address. See Geocoding Outputs for more information.

Geocoding Status

String

Geocoding status for this address. See Geocoding Outputs for more information.

Address Quality Outputs

The Match Score is the first output to consider to assess the quality of the address returned by the plug-in. In addition to this value:

  • The Match Level can be used to assess the level at which matching was made, and the Field Status can be used to assess the details of the matched elements.

  • The Output Status can be used to assess the quality of the input address and its processing.

  • The Postal Code Change Level can be used to assess the quality and changes done on the postal code provided as an input.

The following values are returned in the Match Level output:

ValueDescription

0

No Match.

1

Town, City, Locality.

2

Street.

3

Premise.

4

Organization.

The following values are returned in the Output Status output:

ValueDescription

V

Verified. The input address is verified as mailable without change.

C

Corrected. The input address has been corrected in matching to the reference data.

P

Parsed. The input address has been parsed but there is no matching reference data.

N

Not matched. The input address cannot be matched or parsed.

The Field Status output contains 8 characters. Each character is a value that represents how each address element was matched.

Character positions in the Field Status output:

PositionAddress Element

0

Organization

1

PO Box

2

Building name, Building number

3

Street

4

Locality

5

City

6

Administrative area

7

Postal code

Character values in the Field Status output:

ValueDescription

0

Element Correct (no change)

1

Element Corrected (minor change)

2

Element Corrected (major change)

3

Element Not checked (no data)

4

Element Not found

5

Element Not provided

The following values are returned in the Postal Code Change Level output. This value reflects changes done on the postal code:

ValueDescription

K

No postal code/ZIP code.

L

Input postal code, no output postal code.

M

Output postal code, no input postal code.

N

No change.

P

Postal code change.

Geocoding Outputs

Geocoding information is returned in the Latitude and Longitude outputs.
The quality of the geocoding information is exposed in the Geocoding Level and Geocoding Status outputs.

The following values are returned in the Geocoding Level output:

ValueDescription

5

Delivery Point (PostBox or SubBuilding).

4

Premise (Premise or Building).

3

Thoroughfare.

2

Locality.

1

Administrative Area.

0

None.

The following values are returned in the Geocoding Status output:

ValueDescription

P

Point: A single geocode was found matching the input address

I

Interpolated: A geocode was able to be interpolated from the input addresses location in a range

A

Average: Multiple candidate geocode were found to match the input address, and an average of these was returned

U

Unable to geocode: A geocode was not able to be generated for the input address

Error Management Outputs

The following values are returned in the Result Code field.

ValueDescription

0

An internal error occurred, see the Error Code and Error Text output for details.

1

The address was successfully matched or parsed.

2

No hits were found for this address.

3

Insufficient input details were provided for processing.

4

Ambiguous results. Refer to the Ambiguity list field for details.

GOOGLE MAPS PLUG-IN

The Google Maps Plug-in for Semarchy xDM provides an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal addresses with geocoding information.

Google Maps Enricher

Plug-in ID

Google Maps Enricher - com.semarchy.integration.rowTransformers.googleMapsEnricher

Description

This enricher takes an input address, enriches and validates this postal address using the Google Geocoding Service.


This plug-in must be used in compliance with the Google Maps/Google Earth APIs Terms of Service.

This enricher uses the Google Geocoding Service, which must be accessible from the Semarchy xDM Application at the following URL: http://maps.googleapis.com/maps/api/geocode/json?<parameters>;. Make sure to make this URL accessible through your firewalls.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Client ID or API Key

No

String

This parameter may contain either an API Key (for Standard API usage) or the Client ID (for Premium Usage), both provided by Google. The Client ID should begin with the gme- prefix. When providing a Client ID, the signature (Private Key) is required.

Private Key

No

String

Cryptographic signature key provided by Google with the Client ID.

Default Language

No

String

Code of the default language used for the returned results. For example, for same address, "Rue Mathieu Misery" would appear in French and "Mathieu Misery Street" in English. This code can be overridden by the Language plug-in input. See the list of supported domain languages for more information.


You can use the Google Maps service with one of the following authentication methods:

  • With no API Key or Client ID, with a limited number of queries per day (typically, 2500). Above this limit, the queries fail.

  • With an API Key, with the Standard Usage Limits and a pay-as-you-go plan above the limits.

  • With a Client ID and a Signature (private key) with a Google Maps Premium Plan.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.

Language

No

String

Code of the language for the returned result for this record. This language overrides the Default Language parameter. See the list of supported domain languages for more information.


The state, region or province information can be passed in the City input, concatenated with the city name. For example: Address.City || ' ' || Address.State

The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs. Outputs marked with an * appear in a Full and a Short form in the output list.

Parameter NameTypeDescription

Address Types

String

Comma-separated list of address types (See Address Types for more information.).

Administrative Area Level 1*

String

First-order civil entity below the country level. Within the United States, these administrative levels are states. Not all countries exhibit these administrative levels.

Administrative Area Level 2*

String

Second-order civil entity below the country level. Within the United States, these administrative levels are counties. Not all countries exhibit these administrative levels.

Administrative Area Level 3*

String

Third-order civil entity below the country level. Not all countries exhibit these administrative levels.

Airport

String

Indicates an airport. NOTE: This output is deprecated.

Country*

String

The national political entity.

East Bound Longitude

String

Bounding box eastern limit.

Floor*

String

Indicates the floor of a building address.

Formatted Address

String

Human-readable version of the geocoded address.

Intersection

String

Major intersection, usually of two major roads. NOTE: This output is deprecated.

Latitude

String

Latitude of the address.

Locality*

String

Incorporated city or town political entity.

Longitude

String

Longitude of the address.

Natural Feature*

String

Prominent natural feature.

Neighborhood*

String

Named neighborhood.

North Bound Latitude

String

Bounding box northern limit.

Park*

String

Named park.

Point of Interest*

String

Named point of interest.

Post Box*

String

Specific postal box.

Postal Code*

String

Postal code as used to address postal mail within the country.

Premise*

String

Named location, usually a building or collection of buildings with a common name.

Quality

String

The value of an Address Quality element defines the granularity of the location described by an address. Should return a value that expresses this quality between 0 and 100 (100 being the best quality)

Room*

String

The room of a building address.

Route*

String

Named route (such as US 401).

South Bound Latitude

String

Bounding box southern limit.

Status

String

Status of the request. OK indicates that no error occurred and the address was geocoded. ZERO_RESULTS indicates that no error occurred but the address was not geocoded. See the API documentation for a list of status and error codes

Street Address

String

Precise street address. NOTE: This output is deprecated.

Street Number*

String

Precise street number.

Sub-Locality*

String

First-order civil entity below a locality.

Sub-Premise*

String

First-order entity below a named location, usually a singular building within a collection of buildings with a common name.

West Bound Longitude

String

Bounding box western limit.

Embedded a Google Map in a Form

The Google Geocoding service data must be used to display maps rendered with the Google Maps service.

You can display such a map in Semarchy xDM in a form, by embedding generated HTML and JavaScript.

  1. Create a new form field with the SemQL expression given below.

  2. In the SemQL expression, modify the following line to concatenate your address information:

var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";
  1. If you are a Google Maps API for Work customer, modify in the code the URL to the Google maps service to include your Google Client ID. Note that the embedded map will stop working after adding the client ID. You must register authorized URLs with Google by following the instructions given in the Google Maps API for Work site:

<script src="https://maps.googleapis.com/maps/api/js?client=YOUR_CLIENT_ID&v3.20&sensor=false"></script>
  1. Edit the field:

    • In the Display Properties, Set the Component Type to Object, and in Data, set the Source Type to Content.
      This configuration tells Semarchy xDM to interpret this code as HTML and JavaScript on the browser.

'<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <script src="https://maps.googleapis.com/maps/api/js?sensor=false"></script>
    <script>

/* Modify the line below */
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";

var zoom = 18;
var mapType = google.maps.MapTypeId.ROADMAP;
var useMarker = true;
var map;

function initialize() {
        var geocoder = new google.maps.Geocoder();
        geocoder.geocode( { "address": address}, function(results, status) {
         if (status == google.maps.GeocoderStatus.OK) { displayMap(results[0].geometry.location); }
        });
        window.onresize = resize;
}

function displayMap(latlng) {
        var mapOptions = { zoom: zoom, center: latlng, mapTypeId: mapType }
        map = new google.maps.Map(document.getElementById("map_canvas"), mapOptions);
        if (useMarker) {
                var marker = new google.maps.Marker({ map: map, position: latlng});
        }
        resize("");
}

function resize(e) {
        var center = map.getCenter();
        map.getDiv().style.height = window.innerHeight +"px";
        map.getDiv().style.width = window.innerWidth +"px";
        google.maps.event.trigger(map, ''resize'');
        map.setCenter(center);
}

google.maps.event.addDomListener(window, "load", initialize);
    </script>
  </head>
  <body style="margin:0px;">
    <div id="map_canvas" style="margin:0px;"></div>
  </body>
</html>'

OPEN STREET MAP PLUG-IN

The OpenStreetMap Plug-in for Semarchy xDM uses the OpenStreetMap API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address.

OpenStreetMap Enricher

Plug-in ID

OpenStreetMap Enricher - com.semarchy.engine.plugins.openstreetmap

Description

This enricher takes an input address, enriches and validates this postal address using the OpenStreetMap Service.


This enricher uses the OpenStreetMap Service, which must be accessible from the Semarchy xDM Application at the URL specified in the OpenStreetMap URL parameter. Make sure to make this URL accessible through your firewalls.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

OpenStreetMap URL

Yes

String

URL used to query OpenStreetMap API. Typically http://nominatim.openstreetmap.org/

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.


The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Address

String

Complete address of the location.

City

String

City of the location.

Country

String

Country of the location.

Country Code

String

Country code of the location.

County

String

County of the location.

Latitude

String

Latitude of the location.

Longitude

String

Longitude of the location.

Postal Code

String

Postal code of the location.

Process Code

String

Code that indicates the result status of the address processing.

State

String

State of the Location.

Street Number

String

Street number of the location.

Street Name

String

Street name of the location.

MICROSOFT BING MAPS PLUG-IN

The Microsoft Bing Maps Plug-in for Semarchy xDM uses the Bing Location API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address with geocoding information.

Bing Maps Enricher

Plug-in ID

Google Bing Enricher - com.semarchy.engine.plugins.bing.address

Description

This enricher takes an input address, enriches and validates this postal address using the Bing Maps Service.


This plug-in must be used in compliance with the Microsoft Bing Maps APIs Terms of Service.

This enricher uses the Bing Maps Service, which must be accessible from the Semarchy xDM Application at the URL specified in the Bing Location URL parameter. Make sure to make this URL accessible through your firewalls.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Bing Maps Key

Yes

String

To use the Bing Maps Services, you must have a Bing Maps Key.

Bing Location URL

Yes

String

This URL will be used to query Bing Location API.

Plug-in Inputs

The following table lists the plug-in inputs.

Parameter NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.


The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs.

Parameter NameTypeDescription

Administrative District

String

The subdivision name within the country or region for an address, such as the abbreviation of a US state.

Administrative District 2

String

The subdivision name within the administrative district for an address.

Confidence

String

Defines the confidence of the location match found by the geocoding service. Possible values: High, Medium, Low.

Country or Region

String

The country or region name of the address.

Formatted Address

String

A string specifying the complete address. This address may not include the country or region.

Status Code

String

The HTTP Status code for the request.

Status Description

String

A description of the HTTP status code.

Latitude

String

Latitude of the location.

Locality

String

The locality, such as the primary city, that corresponds to an address.

Longitude

String

Longitude of the address.

Match Code

String

Defines the geocoding level of the location match found by the geocoder. One or more of the following values: Good, Ambiguous, UpHierarchy

Postal Code

String

The city or neighborhood that corresponds to the postal code.

Process Code

String

Code that indicates the result status of the process.

APPENDICES

Appendix A: Semarchy Email Enricher Domain Name Cache

The Semarchy Email Enricher uses a local cache to avoid repeating MX record lookups to check the validity of an email domain.
This domain name cache is used in priority, meaning that if a record is found in the cache, the enricher will use the information available locally and we will not issue a MX record lookup.

The plug-in stores the cache in the table name EXT_EMAIL_DOMAINS. This table is created at first run of the enricher, by default in the data location served by the enricher. You can specify a specific datasource location to store this table in the Datasource enricher parameter.

Domain Name Cache Table Structure

The structure of the EXT_EMAIL_DOMAINS table is the following:

Column NameDescription

HOST_NAME

Domain name. e.g. "gmail.com"

PREFIX

2 first letters of the domain name. e.g. "gm"

SUFFIX

2 last letters of the domain name. e.g. "om"

HIT_COUNT

Number of times this host name was processed by the enricher. This value is automatically incremented by the enricher.

SEED_DATA

Indicates whether this record was part of the seeded data, of created by the enricher. The value is 1for seeded data, 0 otherwise.

VALID

Indicates whether the domain name is valid 1 or invalid 0. The value is N/A if the validity is unknown (for example, when a new domain is added in the cache in offline mode).

SUGGESTION

Latest correction found for an invalid domain.

FIRST_INVALID_DATE
LAST_INVALID_DATE
LAST_VALID_DATE

Additional date information used to reconsider a domain validity after a certain period of time.

Fixing Domain Names

The enricher automatically fixes invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:

  • The Edit Distance between the invalid domain and cached domain.

  • The hit count of the cached domain.

A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.

Adding Records to the Cache

It is possible to force the creation of new records in the cache, for example to create new fix suggestions.

To manually insert a domain correction <domain_name_replacement> for a <domain_host_name> invalid domain, use the following query sample:

INSERT INTO EXT_EMAIL_DOMAINS (
        HOST_NAME,
        PREFIX,
        SUFFIX,
        HIT_COUNT,
        SEED_DATA,
        VALID,
        SUGGESTION,
        FIRST_INVALID_DATE,
        LAST_INVALID_DATE
        )
VALUES (
        <invalid_host_name>,
        SUBSTR(<invalid_host_name>, 0, 2),
        SUBSTR(<invalid_host_name>, -2, 2),
        0,
        '1',
        '0',
        <host_name_replacement>,
        CURRENT_TIMESTAMP,
        CURRENT_TIMESTAMP
        );

Cache Refresh

The Email enricher refreshes the local cache records after 3 months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.

If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to see the date field to NULL so that the email enricher does not refresh the cache for that domain.

It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the 3 month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.

Version 4.4 Rev 1
Last updated 2018-06-14 11:52:52 UTC