Semarchy xDM plug-in reference guide
This article helps you find some useful content using the knowledge base full search. It is a copy of the official's guide.
Please go to the official Semarchy xDM documentation for more self-investigation on your problem.
Welcome to Semarchy xDM.
This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.
PREFACE
Audience
If you want to learn about MDM or discover Semarchy xDM, you can watch our tutorials. |
The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online. |
Document Conventions
This document uses the following formatting conventions:
Convention | Meaning |
---|---|
boldface | Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept. |
italic | Italic type indicates special emphasis or placeholder variable that you need to provide. |
| Monospace type indicates code example, text or commands that you enter. |
Other Semarchy Resources
In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.
Obtaining Help
There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.
Feedback
We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.
Overview
This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.
Using this guide, you will learn how to use these plug-ins in your MDM projects.
INTRODUCTION
What is Semarchy xDM?
Semarchy xDM is designed to support any kind of Enterprise Master Data Management initiative. It brings an extreme flexibility for defining and implementing master data models and releasing them to production. The platform can be used as the target deployment point for all master data of your enterprise or in conjunction with existing data hubs to contribute to data transparency and quality with federated governance processes. Its powerful and intuitive environment covers all use cases for setting up a successful master data governance strategy.
Semarchy xDM is based on a coherent set of features for all Master Data Management projects.
Semarchy xDM Plug-ins Architecture
Semarchy xDM implements plug-ins that use external services or information systems to contribute to the master data processing and enrichment.
Plug-ins are used in Semarchy xDM in:
Enrichers: By adding new enrichers, you can perform record-level enrichment to update, augment or standardize existing attribute values, or create content in new attributes. For example, you can connect to an external web service to retrieve stock ticker symbols from company names.
Validations: By adding new validations, you can perform record-level checks, that is check the value of attributes in a record against complex rules. For example, you can connect to an external provider to check whether a billing or shipping address is valid or not.
INFO: Using Plug-ins is explained in the Semarchy xDM Developer’s Guide, in the Certification Process Design chapter. Installing plug-insto your Semarchy xDM instance is explained in the Semarchy xDM Administration Guide, in the Managing the Platform chapter.
The plug-ins are designed using the Open Plug-In Architecture. Plug-in design is covered in the Semarchy xDM Plug-in Development Guide. |
TEXT NORMALIZATION AND TRANSLITERATION
This plug-in applies normalization, transliteration and phonetic transformations to text strings.
Semarchy Text Enricher
Plug-in ID
Semarchy Text Enricher - com.semarchy.engine.plugins.convergence.text
Description
This enricher applies normalization, transliteration and phonetic transformations to text strings. It takes an Input Text and applies anInput Filter to this text, for example to remove all characters but letters. Then it applies a series of transformations defined in theTransformation parameter and returns a Transformed Text.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Filter | No | String | Filter applied to the input text before the transformation. Valid values for the Filter are: |
Transformation | Yes | String | A pipe-separated sequence of transformation definitions. Transformations include:
See the Transformations section for a detailed description of each transformation. |
Synonyms Separator | No | String | Separator used between the synonyms returned by the enricher. Default value is a pipe (|). |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Text | Yes | String | Text to transform. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Transformed Text | String | Filtered and transformed text. |
Secondary Transformed Text | String | Secondary transformed text. This text may contain transformation resulting from a Beidermorse or Double Metaphone transformation. See Other Transformations for more information. |
Input Filters
The following input filters are supported by the enricher:
NONE
: No filter is applied to the input text.LETTERS
: This transformation removes all non-letter characters from the input string.STANDARD
: Breaks words in the input text according to the rules from the Unicode Text Segmentation algorithm, as specified inUnicode Standard Annex #29.
Transformations
The following transformations definitions are supported by the enricher:
Normalization
NORMALIZE
: Performs a Normalization
Phonetic Transformation
PHONETIC [SOUNDEX | REFINEDSOUNDEX | METAPHONE [<max_code_length>] | DOUBLEMETAPHONE [<max_code_length>] | CAVERPHONE | CAVERPHONE1 | NYSIIS | MRA | COLOGNE | BEIDERMORSE ]
: applies Phonetic Transformations
Other Transformations
BEIDERMORSE [Split] [RuleType] [MaxPhonems] [NameType]
DOUBLEMETAPHONE [<max_code_length>] [split]
Transliteration
TRANSLITERATE [<ID>]
apply a Transliteration transformation to the string. The transliteration is identified by an ID. If not ID is provided, the Any-Latin transliteration is used.
It is possible to sequence transformations. Successive transformations are separated by a pipe |
sign.
Examples of transformations:
Normalize and apply Phonetic Soundex:
NORMALIZE | SOUNDEX
Normalize and then transliterate to Latin script:
NORMALIZE | TRANSLITERATE Any-Latin
Normalize, transliterate to Latin script and then apply Metaphone with a maximum resulting length of 5 characters:
NORMALIZE | TRANSLITERATE Any-Latin | PHONETIC METAPHONE 5
Perform a BEIDERMORSE transformation for family names with an approximate transformation on generic name types:
BEIDERMORSE APPROX 10 FALSE GENERIC
Normalization
The NORMALIZE
transformation normalizes the string by applying a series of transformations, which map similar characters to a common target, to ignore certain distinctions between similar characters. This includes accent removal, case folding, etc.
Example of transformations:
Original Text | Normalized Text | Comments |
---|---|---|
‒ – — ― | - - - - | 4 different dashes converted to 4 similar dashes. |
AbSoLuteLy TRUE | absolutely true | CaseFolding |
… | ... | convert [dotdotdot] to [dot dot dot] |
½ Tsp | 1/2 tsp | Symbol folding |
Æsop | aesop | |
Äsop | asop | |
Dürst | durst | |
Encyclopædia | encyclopaedia | |
œuvre | oeuvre | |
poſt | post | |
résumé français | resume francais | Accent removal and case folding |
Straße | strasse | |
٣ is a magic number | 3 is a magic number | Native Digital folding |
The complete list of transformations is given below:
Accent removal | Hebrew Alternates folding | Overline folding | Suzhou Numeral folding |
Case folding | Jamo folding | Positional forms folding | Symbol folding |
Canonical duplicates folding | Letterforms folding | Small forms folding | Underline folding |
Dashes folding | Math symbol folding | Space folding | Vertical forms folding |
Diacritic removal (including stroke, hook, descender) | Multigraph Expansions: All | Spacing Accents folding | Width folding |
Greek letterforms folding | Native digit folding | Subscript folding | Han Radical folding |
For more information about these transformations see the UTR#30 Characters Foldings transformation.
Phonetic Transformations
A phonetic transformation applied to the string transforms it to a string corresponding to its pronunciation. The default phonetic transformation is PHONETIC METAPHONE
.
Phonetic transformations include:
PHONETIC SOUNDEX
andPHONETIC REFINEDSOUNDEX
: Phonetic algorithms for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. More information about SoundexPHONETIC METAPHONE
andPHONETIC DOUBLEMETAPHONE
are algorithms for indexing words by their English pronunciation. They are suitable for use with most English words, not just names. Double Metaphone can return both a primary and a secondary code for an input string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. These algorithms support a Max Code Length parameter which defines the maximum length of the encoded result. This value default to 4. More Details about Metaphone.PHONETIC CAVERPHONE
andPHONETIC CAVERPHONE1
. Algorithm for data matching for electoral rolls, optimized for accents present in parts of New Zealand. More Details about Caverphone and Caverphone 1PHONETIC NYSIIS
. New York State Identification and Intelligence System (NYSIIS), which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. More Details about NYSIISPHONETIC MRA
: Match Rating Approach developed by Western Airlines - this algorithm has an encoding and range comparison technique. More Details about MRAPHONETIC COLOGNE
Phonetic algorithm optimized for the German language. See Kölner PhonetikPHONETIC BEIDERMORSE
is a phonetic algorithm supporting greater accuracy in matching Slavic and Yiddish surnames with similar pronunciation but differences in spelling. It returns a list of tokens (separated by the string specified in the Synonyms Separator parameter.): first the transformed input text, then the transformed synonyms of the input text. More information aboutBeidermorse.
Other Transformations
These other transformations return a list of tokens which can be split into the Transformed Text and Secondary Transformed Textoutputs.
These transformations should be preferably used at the end of the transformation sequence, as their secondary transformed text is not processed in subsequent transformations in the sequence. |
Other transformations include:
BEIDERMORSE [<split>] [<rule_type>] [<max_phonems>] [<name_type>]
The Beidermorse transformation returns a list of tokens: first the transformed input text, then the transformed synonyms of the input text. Beidermorse supports the following parameters:split. If this parameter is set to
true
all synonyms after the first one are concatenated in the Secondary Transformed Text output. If this parameter is set tofalse
(default value) all synonyms are appended to the first token in the Transformed Text output.rule_type is
EXACT
for exact orAPPROX
for approximate phonetic transformation.max_phonems is the maximum number of synonyms returned. Default is 20.
name_type default value is
GENERIC
. UseASHKENAZI
orSEPHARDIC
if you specifically want phonetic encodings optimized for Ashkenazi or Sephardic Jewish family names.
DOUBLEMETAPHONE [<max_code_length>] [<split>]
. This transformation encodes the input string with the Double Metaphone algorithm and returns a primary code and a secondary code. If split is set totrue
, then the secondary code is pushed to the Secondary Transformed Text output. Otherwise, it is concatenated to the primary code in the Transformed Text output.
Transliteration
The TRANSLITERATE
transformation transforms a text from one character script to another. For example, Traditional to Simplified Chinese, Japanese Hiragana to Katakana, Cyrillic to Latin script.
Each source/target transliteration is identified by an ID. The list of supported transliteration IDs is provided in the list below. If no ID is provided, the Any-Latin transliteration is used.
Each ID represents a transliteration from one script/language to another. For example: Katakana-Latin, Latin-thai, etc. The special tag any stands for any script/language. For example, Any-Latin converts any input script to Latin script.
Accents-Any | Any-Name | Devanagari-Bengali | Han-Latin | Latin-Greek | Pinyin-NumericPinyin |
Amharic-Latin/BGN | Any-NFC | Devanagari-Gujarati | Han-Latin/Names | Latin-Greek/UNGEGN | pl_FONIPA-ja |
Any-Accents | Any-NFD | Devanagari-Gurmukhi | Hangul-Latin | Latin-Gujarati | pl-ja |
Any-am | Any-NFKC | Devanagari-Kannada | Hans-Hant | Latin-Gurmukhi | pl-pl_FONIPA |
Any-Arabic | Any-NFKD | Devanagari-Latin | Hant-Hans | Latin-Han | Publishing-Any |
Any-Armenian | Any-Null | Devanagari-Malayalam | Hebrew-Latin | Latin-Hangul | ro_FONIPA-ja |
Any-Bengali | Any-Oriya | Devanagari-Oriya | Hebrew-Latin/BGN | Latin-Hebrew | ro-ja |
Any-Bopomofo | Any-pl_FONIPA | Devanagari-Tamil | Hex-Any | Latin-Hiragana | ro-ro_FONIPA |
Any-CaseFold | Any-Publishing | Devanagari-Telugu | Hex-Any/C | Latin-Jamo | ru-ja |
Any-cs_FONIPA | Any-Remove | Digit-Tone | Hex-Any/Java | Latin-Kannada | ru-zh |
Any-Cyrillic | Any-ro_FONIPA | es_419-ja | Hex-Any/Perl | Latin-Katakana | Russian-Latin/BGN |
Any-Devanagari | Any-ru | es_419-zh | Hex-Any/Unicode | Latin-Malayalam | Serbian-Latin/BGN |
Any-es_419_FONIPA | Any-sk_FONIPA | es_FONIPA-am | Hex-Any/XML | Latin-NumericPinyin | Simplified-Traditional |
Any-es_FONIPA | Any-Syriac | es_FONIPA-es_419_FONIPA | Hex-Any/XML10 | Latin-Oriya | sk_FONIPA-ja |
Any-FCC | Any-Tamil | es_FONIPA-ja | Hiragana-Katakana | Latin-Syriac | sk-ja |
Any-FCD | Any-Telugu | es_FONIPA-zh | Hiragana-Latin | Latin-Tamil | sk-sk_FONIPA |
Any-Georgian | Any-Thaana | es-am | IPA-XSampa | Latin-Telugu | Syriac-Latin |
Any-Greek | Any-Thai | es-es_FONIPA | it-am | Latin-Thaana | Tamil-Bengali |
Any-Greek/UNGEGN | Any-Title | es-ja | it-ja | Latin-Thai | Tamil-Devanagari |
Any-Gujarati | Any-Upper | es-zh | ja_Latn-ko | Macedonian-Latin/BGN | Tamil-Gujarati |
Any-Gurmukhi | Any-zh | Fullwidth-Halfwidth | ja_Latn-ru | Malayalam-Bengali | Tamil-Gurmukhi |
Any-Han | Arabic-Latin | Georgian-Latin | Jamo-Latin | Malayalam-Devanagari | Tamil-Kannada |
Any-Hangul | Arabic-Latin/BGN | Georgian-Latin/BGN | JapaneseKana-Latin/BGN | Malayalam-Gujarati | Tamil-Latin |
Any-Hans | Armenian-Latin | Greek-Latin | Kannada-Bengali | Malayalam-Gurmukhi | Tamil-Malayalam |
Any-Hant | Armenian-Latin/BGN | Greek-Latin/BGN | Kannada-Devanagari | Malayalam-Kannada | Tamil-Oriya |
Any-Hebrew | ASCII-Latin | Greek-Latin/UNGEGN | Kannada-Gujarati | Malayalam-Latin | Tamil-Telugu |
Any-Hex | Azerbaijani-Latin/BGN | Gujarati-Bengali | Kannada-Gurmukhi | Malayalam-Oriya | Telugu-Bengali |
Any-Hex/C | Belarusian-Latin/BGN | Gujarati-Devanagari | Kannada-Latin | Malayalam-Tamil | Telugu-Devanagari |
Any-Hex/Java | Bengali-Devanagari | Gujarati-Gurmukhi | Kannada-Malayalam | Malayalam-Telugu | Telugu-Gujarati |
Any-Hex/Perl | Bengali-Gujarati | Gujarati-Kannada | Kannada-Oriya | Maldivian-Latin/BGN | Telugu-Gurmukhi |
Any-Hex/Plain | Bengali-Gurmukhi | Gujarati-Latin | Kannada-Tamil | Mongolian-Latin/BGN | Telugu-Kannada |
Any-Hex/Unicode | Bengali-Kannada | Gujarati-Malayalam | Kannada-Telugu | Name-Any | Telugu-Latin |
Any-Hex/XML | Bengali-Latin | Gujarati-Oriya | Katakana-Hiragana | NumericPinyin-Latin | Telugu-Malayalam |
Any-Hex/XML10 | Bengali-Malayalam | Gujarati-Tamil | Katakana-Latin | NumericPinyin-Pinyin | Telugu-Oriya |
Any-Hiragana | Bengali-Oriya | Gujarati-Telugu | Kazakh-Latin/BGN | Oriya-Bengali | Telugu-Tamil |
Any-ja | Bengali-Tamil | Gurmukhi-Bengali | Kirghiz-Latin/BGN | Oriya-Devanagari | Thaana-Latin |
Any-Kannada | Bengali-Telugu | Gurmukhi-Devanagari | Korean-Latin/BGN | Oriya-Gujarati | Thai-Latin |
Any-Katakana | Bopomofo-Latin | Gurmukhi-Gujarati | Latin-Arabic | Oriya-Gurmukhi | Tone-Digit |
Any-ko | Bulgarian-Latin/BGN | Gurmukhi-Kannada | Latin-Armenian | Oriya-Kannada | Traditional-Simplified |
Any-Latin (default) | cs_FONIPA-ja | Gurmukhi-Latin | Latin-ASCII | Oriya-Latin | Turkmen-Latin/BGN |
Any-Latin/BGN | cs_FONIPA-ko | Gurmukhi-Malayalam | Latin-Bengali | Oriya-Malayalam | Ukrainian-Latin/BGN |
Any-Latin/Names | cs-cs_FONIPA | Gurmukhi-Oriya | Latin-Bopomofo | Oriya-Tamil | Uzbek-Latin/BGN |
Any-Latin/UNGEGN | cs-ja | Gurmukhi-Tamil | Latin-Cyrillic | Oriya-Telugu | XSampa-IPA |
Any-Lower | cs-ko | Gurmukhi-Telugu | Latin-Devanagari | Pashto-Latin/BGN | zh_Latn_PINYIN-ru |
Any-Malayalam | Cyrillic-Latin | Halfwidth-Fullwidth | Latin-Georgian | Persian-Latin/BGN |
LOOKUP
This plug-in performs a data lookup on a mapping table.
Semarchy Lookup Enricher
Plug-in ID
Semarchy Lookup Enricher - com.semarchy.engine.plugins.convergence.text
Description
This enricher performs a data lookup on a mapping table accessed via a JDBC datasource.
The mapping table is located in a datasource provided using the Datasource parameter, which defaults to the data location’s datasource. The mapping table is declared to the enricher:
By giving a Mapping Table as well as a Lookup Column and an Output Column from this table. The input lookup value is searched in the Lookup Column and the corresponding value from the Output Column is returned.
By giving a Custom SQL select statement in the form:
select <lookup_column> LOOKUP_COLUMN, <output_column> OUTPUT_COLUMN from <mapping_table> where …
. This statement is executed on the datasource, and must return two columns aliasedLOOKUP_COLUMN
andOUTPUT_COLUMN
. These columns will be used as the lookup and output columns.
You must either set Mapping Table, Lookup Column and Output Column, or only set Custom SQL. The Mapping Table, Lookup Column, and Output Column parameters are mandatory unless the Custom SQL parameter is set instead. |
The lookup is performed on the mapping table with an optional memory cache configured with the Cache Lookup Data parameter.
When a null value is passed as the Lookup Value or when the lookup finds no matching value in lookup column, the enricher returns the Fallback Value or the Lookup Value, depending on the Fallback Behavior parameter.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Cache Lookup Data | No | String | Use this parameter to optionally use a memory cache for the lookup process. Possible values are:
|
Custom SQL | No | String | Leave this parameter empty to use a generated SQL query. Use this parameter instead of Mapping Table, Lookup Column and Output Column to define the lookup dataset with a select statement in the following form: select <lookup_column> LOOKUP_COLUMN, <output_column> OUTPUT_COLUMN from <mapping_table> where ... Note that this query must return a dataset with two columns aliased |
Datasource | No | String | JNDI name of datasource containing the lookup data. If this parameter is not defined, the enricher uses the data location datasource. Note that this parameter should contain the full path of the datasource, for example: |
Fallback Behavior | No | String | Behavior when the lookup value is not found in the lookup column. Possible values are:
|
Fallback Value | No | String | Value to return if the lookup value is not found in the lookup column. Default value: |
Lookup Column | No | String | Physical name of the column containing the lookup values. Default value: |
Mapping Table | No | String | Physical name of the mapping table containing the lookup and output columns. Default value: |
Output Column | No | String | Physical name of the column containing the values returned by the enricher. Default value: |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Lookup Value | Yes | String | Value to look for in the mapping table’s lookup column. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Output Value | String | Value returned by the lookup. |
TRANSLATION
Google Translate Enricher
Plug-in ID
Google Translate Enricher - com.semarchy.engine.plugins.convergence.translate.v2
Description
This enricher translates an Input Text from a Source Language to a Target Language using the Google Translate service. The source language is automatically detected if unspecified. This enricher requires a valid Google Key.
This plug-in must be used in compliance with the Google Translate APIs Terms of Service. |
This enricher uses the Google Translate Service, which must be accessible from the Semarchy xDM Application at the following URL: https://www.googleapis.com/language/translate/v2?<parameters>; . Make sure to make this URL accessible through your firewalls. |
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Application Name | Yes | String | Name of the client application accessing the Google Translate service. Application names should preferably have the format |
Google Key | Yes | String | Google API Key. It is a unique key that you generate using the Google API Console. |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Text | Yes | String | Text to translate. |
Source Language | No | String | Language of the input text. If it is unspecified, it is detected from the input text. |
Target Language | Yes | String | Target language for the translation. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Translated Text | String | Translated Text. |
NAME PROCESSING
Semarchy Person Name Enricher
Plug-in ID
Semarchy Person Name Enricher - com.semarchy.engine.plugins.convergence.personname.PersonNameEnricher
Description
This enricher extracts from a person’s full name his/her Given Name, Surname and Gender. It parses the Input Name and identifies a Given Name and Surname (with a Name Parsing Score confidence percentage). Then the given name is searched in a database of names for the source country code provided in the input. It a given name is matched, a Gender and a Most Frequent Gender (if the given name is unisex) are returned.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Surname Position | Yes | String | Position of the Surname. This parameter is used for parsing the input name to detect the first and last names, and for generating the Full Name output. Possible values ( |
Case Transformation | Yes | String | Case transformation for the name. Possible values: |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Name | Yes | String | Person full name to enrich. |
Source Country Code | Yes | String | Code of the country of origin for the name. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Full Name | String | The reconstructed full name, with the surname positioned according to the Surname Position parameter. |
Gender | String | The gender of the Matched Given Name. One of MALE, FEMALE, UNISEX, UNKNOWN. |
Gender Score | String | Confidence with which for Most Frequent Gender can be used [0-100]. |
Given Name | String | The part identified as Given Name in the input name. |
Matched Given Name | String | Given name matched in the given name database. |
Most Frequent Gender | String | The more frequent gender of the Matched Given Name for the given country. One of MALE, FEMALE, UNKNOWN. |
Names Parsing Score | String | Names Parsing confidence [0-100] |
Surname | String | The part identified as Surname in the input name. |
Surname Position | String | Position at which the surname was detected. |
INTERNATIONAL PHONE NUMBERS PLUG-IN
The International Phone Numbers Plug-In for Semarchy xDM provides two features:
An enricher to standardize and improve phone numbers formatting.
A validator to check the validity of phone numbers.
Semarchy Phone Enricher
Plug-in ID
Semarchy Phone Enricher - com.semarchy.engine.plugins.convergence.phone
Description
This enricher takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Region Code. It returns a standardized Enriched Phone Number in the Enriched Phone Format. Geocoding Data is also returned and includes (depending on the country) the country, the region/state and the city name.
If a phone number is not valid, the enricher returns the original phone value in the Enriched Phone Number, a Status Code as well as a Status Text describing the issue with the input phone number.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
This plug-in does not use any parameter.
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Phone Number | Yes | String | Input Phone Number. |
Region Code | No | String | Two letters region code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code. |
Enriched Phone Format | No | String | Format of the Enriched Phone Number. Possible values are |
Region of Origin | No | String | Formats the phone output for international dialing from the country or region provided in this input. E.g.: |
Phone Formats
The following standards are supported to format the enriched phone number:
INTERNATIONAL
andNATIONAL
refer to the ITU-T Recommendation E.123 for national and international phone numbers.E164
refers to the ITU-T Recommendation E.164.RFC3966
refers to the IETF 3966 RFC.
Phone Format Examples:
E.123 - National Notation: (042) 123 4594
E.123 - International Notation: +31 42 123 4567
E.164 - International Notation: +31421234567 (equivalent to E.123 with no formatting)
RFC3966 - International Notation: +31-42-123-4567 (equivalent to E.123 with hyphens instead of spaces)
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Enriched Phone Number | String | Phone number returned by the enricher in the format specified in the Enriched Phone Format input. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues. |
Geocoding Data | String | Geocoding data computed for a given number and country. Depending on the country and phone number, this value includes the country, region/state and city information. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues. |
Status Code | String | Return code for the phone number processing. More details about the Status Codes. |
Status Text | String | Text explaining the status code. |
International Phone Prefix | String | International Phone Prefix for worldwide dialing. |
National Number | String | National number part of a phone number in International format. It is often the International number without the Country Prefix. |
Extension | String | Extension part of the phone number. |
Country Code Source | String | Explains how the Country Code was retrieved. Possible values are |
Leading Zero | String | Returns 0 or 1 to specify if leading zero is mandatory for foreign calls. |
Possible Phone Number | String | Returns 0 or 1 to indicate whether a phone number is a possible number, and the region where the number could be dialed from. |
Possible Phone Number Reason | String | Detailed explanation of why a phone number is a possible number or not. Possible values are |
Valid Phone Number | String | Returns 0 or 1 to indicate whether a phone number matches a valid pattern. |
Valid Phone Number For Region | String | Returns 0 or 1 to indicate that a phone number is valid for the specified Region Code. |
Phone Line Type | String | Provides the line type of a phone number. Possible values are : |
Region Code | String | Returns the region code for the Phone Number. See this link for the list of codes. |
International Phone Number | String | Phone number formatted for international dialing. |
Time Zones | String | List of corresponding time zones for a given number. For example: |
First Time Zone | String | First time zone from the list of corresponding time zones for a given number. |
Carrier Name | String | Name of the carrier for the phone number. |
Status Codes
The following status codes are returned by the enricher:
0 - OK
: Optimal execution. No error detected.1 - INPUT_WAS_NULL
: Input phone number was not set.2 - PARSING FAILED
: The string supplied did not seem to be a phone number. Review the Status text for more information.
Semarchy Phone Extractor
Plug-in ID
Semarchy Phone Extractor - com.semarchy.engine.plugins.convergence.phone.extractor
Description
This enricher extracts a list of phone numbers from an Input Text and returns them as a Phone List, in a given Extraction Format.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Matching Leniency | No | String | Defines the phone number extraction leniency. Possible values are |
Extraction Format | No | String | Format of the extracted phone numbers. Possible values are |
List Separator | No | String | Define the separator character used in the extracted phones list. |
Maximum Invalid Numbers | No | String | Maximum number of invalid numbers allowed before stopping to process the text. This is to cover cases where the text contains a lot of false positives. |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Text | Yes | String | Input text to search for phone numbers. |
Accepted Region | No | String | Defines the region used when Matching Leniency is set to |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Extracted Phone List | String | List of phone numbers extracted. |
Phone 1 to Phone 5 | String | First, second… extracted phone number in the list. |
Semarchy Phone Validator
Plug-in ID
Semarchy Phone Validator - com.semarchy.engine.plugins.convergence.phone
Description
This validator takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Country Code. The validator checks whether this phone number is a valid international or national phone number.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Validation Leniency | No | String | Precise validation leniency for possible phone numbers. Value may be |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Phone Number | Yes | String | Input Phone Number. |
Country Code | No | String | Two letters country code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code. |
EMAIL PLUG-IN
The Email Plug-In for Semarchy xDM provides an enricher to improve the quality of email addresses and a validator to check email validity.
Semarchy Email Enricher
Plug-in ID
Semarchy Email Enricher - com.semarchy.engine.plugins.convergence.email
Description
This enricher takes an Input Email Address and splits this address into the local-part (user name) and the domain name. Both these parts are checked syntactically and syntax errors are fixed automatically. The domain name validity is also checked using MX records lookup. The plug-in uses a Domain Name Cache for faster checks and automated fixes on domain names.
This plug-in is thread-safe and supports parallel execution. |
Domain Name Cache
The plug-in uses several mechanisms for faster checks and automated fixes on domain names:
Domain names already checked as valid (MX record lookup) are persisted in a domain name cache stored in a JDBC Datasource. This avoids repeating MX lookup.
A list of known domains (e.g.:
hotmail.com
,gmail.com
, etc.) is automatically seeded in the host name validation cache.Common domain mistakes are fixed using a seeded replace list. For example
gmai.com
is automatically fixed togmail.com
using the cache.Invalid domains are automatically fixed to similar valid domains already present in the cache. For example,
semarcyh.com
is fixed tosemarchy.com
assemarchy.com
was previously checked as a valid domain name.
See Appendix A: Semarchy Email Enricher Domain Name Cache for more information about the domain name cache.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Datasource | No | String | Full name of the JDBC Datasource used to store the host name validation cache. |
Lowercase User Name | No | String | Set to `1' to transform the local-part (username) to lowercase in the cleansed email address. |
Offline Mode | No | String | Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup. |
Processing Mode | No | String | Processing mode: |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Email Address | Yes | String | Input email address to cleanse. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Cleansed Email Address | String | Cleansed email address returned by the enricher. This address may be valid or not. The syntactic validity or domain name validity of the email address is indicated in the other plug-in outputs. |
Valid Domain | String | Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the cleansed email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid or invalid. It returns |
Valid Domain Syntax | String | Flag (0 or 1) indicating whether the domain name syntax is valid or not in the cleansed email address. |
Valid Email Syntax | String | Flag (0 or 1) indicating whether the cleansed email address is syntactically valid or not. |
Valid Username Syntax | String | Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the cleansed email address. |
Valid Input Domain | String | Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the input email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid of invalid. It returns |
Valid Input Domain Syntax | String | Flag (0 or 1) indicating whether the domain name syntax is valid or not in the input email address. |
Valid Input Email Syntax | String | Flag (0 or 1) indicating whether the input email address is syntactically valid or not. |
Valid Input Username Syntax | String | Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the input email address. |
Semarchy Email Validator
Plug-in ID
Semarchy Email Validator - com.semarchy.engine.plugins.convergence.email
Description
This enricher takes an Input Email Address and checks its syntactic validity. The domain name validity is optionally also checked using MX records lookup.
The plug-in uses the same mechanisms as the Semarchy Email Enricher for checking the email validity, except that it does not modify the incoming email.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Accepted Domains | No | String | Value tolerated for the email domain. Possible values:
|
Offline Mode | No | String | Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup. |
Processing Mode | No | String | Processing mode: |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Email Address | Yes | String | Input email address to check. |
GBGROUP MATCHCODE GLOBAL PLUG-IN
The Matchcode Global Plug-in for Semarchy xDM uses GBGRoup Matchcode Global to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal addresses with geocoding and timezone information.
Matchcode Global Enricher
Plug-in ID
Matchcode Global Enricher - com.semarchy.engine.plugins.convergence.address
Description
This enricher takes an input address, enriches and validates this postal address.
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
On-Premise Host | No | String | Host name or IP address of the Pool Manager server for an on-premise installation. |
On-Premise Port | No | String | Port of the Pool Manager service for an on-premise installation. The default value is 27920. |
On-Demand URL | No | String | URL of the on-demand service. To use this service, the |
Data Elements Format | No | String | Format used for the data elements returned by the enricher. Possible values are |
Pool Names List | No | String | Comma-separated list of pool names to query. Use the |
Pools and On-Demand/On-Premise Configuration
A pool represents a set of databases to search addresses in an on-premise setup of Matchcode Global. Pools (identified by their Pool Name) are defined and managed by the Pool Manager server. The plug-in connects to this server using the On-Premise Host and On-Premise Port parameters and queries the pools specified in the Pool Names List.
For more information about pools configuration and the pool manager, see the Capscan Pool Manager Documentation provided with your installation of Matchcode Global. |
When performing an address query, the plug-in uses the Pool Names List (either provided as an input or parameter). The query is launched on each pool in the list until a pool is able to process the address.
In the Pool Names List, a specific pool called ON_DEMAND
allows switching to on-demand processing. When this pool name appears in the list, the On-Demand URL is used to query the on-demand service. If ON_DEMAND
only appears in the pool names list, the On-Premise Host and On-Premise Port parameters are unused.
When configured to use the On-Demand service, this enricher uses a geocoding server which must be accessible from the Semarchy xDM Application at the URL specified in the On Demand URL parameter. Make sure to make this URL accessible through your firewalls. |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
Pool Names List | No | String | Comma-separated list of pool names to query. Use the |
The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Formatted Address | ||
Address | String | Comma-separated list of address lines. This output contains the full formatted address. |
Address Key | String | UK Address Key as defined by the Royal Mail. |
Ambiguity | ||
Ambiguity List | String | Comma-separated list of address elements and postal codes of the form: |
Ambiguity List Count | Integer | Count of entries in the ambiguity list. |
Address Items | ||
Organization | String | Organization Name. |
Building Name | String | Name of the building. |
Building Number | String | Number of the building. |
Sub-Building | String | Sub-building information. Postal boxes (PO Box) information appear in this field. |
Street | String | Street of the address. |
Dependent Street | String | Street to which this address’ street depends to. |
Locality | String | Locality of the address. |
Dependent Locality | String | Locality to which this address’ locality depends to. |
County | String | Name of the county or province. |
Postal Code | String | Postal code. |
Postal Town | String | Town or City. |
Country | String | Country of the address. |
Country Code | String | Country code. |
Error Management | ||
Result Code | Integer | Result code for the address search. See Error Management Outputs for more information. |
Error Code | String | Error code returned by the server. |
Error Text | String | Error message returned by the server. |
Address Quality | ||
Field Status | String | 8 character string. Each character represents how each address element was matched. See Address Quality Outputs for more information. |
Match Score | Integer | Percentage score describing the quality of the address match. |
Match Level | Integer | Address element to which the address is matched. See Address Quality Outputs for more information. |
Output Status | String | This output field contains the status of the address match; Whether Verified, Corrected, Parsed or Not Matched. See Address Quality Outputsfor more information. |
Postal Code Change Level | String | The level at which the matched postal code differs from the input postal code. See Address Quality Outputs for more information. |
Input Postal Code Level | Integer | Level of postal code input: 0 - No post code, 4 - Postal code. |
Output Postal Code Level | Integer | Level of postal code match: 0 - No post code, 4 - Postal code. |
Geocoding | ||
Latitude | Float | GPS (WGS84) latitude in degrees decimal |
Longitude | Float | GPS (WGS84) longitude in degrees decimal |
Geocoding Level | Integer | Geocoding level for this address. See Geocoding Outputs for more information. |
Geocoding Status | String | Geocoding status for this address. See Geocoding Outputs for more information. |
Address Quality Outputs
The Match Score is the first output to consider to assess the quality of the address returned by the plug-in. In addition to this value:
The Match Level can be used to assess the level at which matching was made, and the Field Status can be used to assess the details of the matched elements.
The Output Status can be used to assess the quality of the input address and its processing.
The Postal Code Change Level can be used to assess the quality and changes done on the postal code provided as an input.
The following values are returned in the Match Level output:
Value | Description |
---|---|
0 | No Match. |
1 | Town, City, Locality. |
2 | Street. |
3 | Premise. |
4 | Organization. |
The following values are returned in the Output Status output:
Value | Description |
---|---|
V | Verified. The input address is verified as mailable without change. |
C | Corrected. The input address has been corrected in matching to the reference data. |
P | Parsed. The input address has been parsed but there is no matching reference data. |
N | Not matched. The input address cannot be matched or parsed. |
The Field Status output contains 8 characters. Each character is a value that represents how each address element was matched.
Character positions in the Field Status output:
Position | Address Element |
---|---|
0 | Organization |
1 | PO Box |
2 | Building name, Building number |
3 | Street |
4 | Locality |
5 | City |
6 | Administrative area |
7 | Postal code |
Character values in the Field Status output:
Value | Description |
---|---|
0 | Element Correct (no change) |
1 | Element Corrected (minor change) |
2 | Element Corrected (major change) |
3 | Element Not checked (no data) |
4 | Element Not found |
5 | Element Not provided |
The following values are returned in the Postal Code Change Level output. This value reflects changes done on the postal code:
Value | Description |
---|---|
K | No postal code/ZIP code. |
L | Input postal code, no output postal code. |
M | Output postal code, no input postal code. |
N | No change. |
P | Postal code change. |
Geocoding Outputs
Geocoding information is returned in the Latitude and Longitude outputs.
The quality of the geocoding information is exposed in the Geocoding Level and Geocoding Status outputs.
The following values are returned in the Geocoding Level output:
Value | Description |
---|---|
5 | Delivery Point (PostBox or SubBuilding). |
4 | Premise (Premise or Building). |
3 | Thoroughfare. |
2 | Locality. |
1 | Administrative Area. |
0 | None. |
The following values are returned in the Geocoding Status output:
Value | Description |
---|---|
P | Point: A single geocode was found matching the input address |
I | Interpolated: A geocode was able to be interpolated from the input addresses location in a range |
A | Average: Multiple candidate geocode were found to match the input address, and an average of these was returned |
U | Unable to geocode: A geocode was not able to be generated for the input address |
Error Management Outputs
The following values are returned in the Result Code field.
Value | Description |
---|---|
0 | An internal error occurred, see the Error Code and Error Text output for details. |
1 | The address was successfully matched or parsed. |
2 | No hits were found for this address. |
3 | Insufficient input details were provided for processing. |
4 | Ambiguous results. Refer to the Ambiguity list field for details. |
GOOGLE MAPS PLUG-IN
The Google Maps Plug-in for Semarchy xDM provides an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal addresses with geocoding information.
Google Maps Enricher
Plug-in ID
Google Maps Enricher - com.semarchy.integration.rowTransformers.googleMapsEnricher
Description
This enricher takes an input address, enriches and validates this postal address using the Google Geocoding Service.
This plug-in must be used in compliance with the Google Maps/Google Earth APIs Terms of Service. |
This enricher uses the Google Geocoding Service, which must be accessible from the Semarchy xDM Application at the following URL: http://maps.googleapis.com/maps/api/geocode/json?<parameters>; . Make sure to make this URL accessible through your firewalls. |
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Client ID or API Key | No | String | This parameter may contain either an API Key (for Standard API usage) or the Client ID (for Premium Usage), both provided by Google. The Client ID should begin with the |
Private Key | No | String | Cryptographic signature key provided by Google with the Client ID. |
Default Language | No | String | Code of the default language used for the returned results. For example, for same address, "Rue Mathieu Misery" would appear in French and "Mathieu Misery Street" in English. This code can be overridden by the Language plug-in input. See the list of supported domain languages for more information. |
You can use the Google Maps service with one of the following authentication methods:
|
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
Language | No | String | Code of the language for the returned result for this record. This language overrides the Default Language parameter. See the list of supported domain languages for more information. |
The state, region or province information can be passed in the City input, concatenated with the city name. For example: Address.City || ' ' || Address.State |
The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input. |
Plug-in Outputs
The following table lists the plug-in outputs. Outputs marked with an * appear in a Full and a Short form in the output list.
Parameter Name | Type | Description |
---|---|---|
Address Types | String | Comma-separated list of address types (See Address Types for more information.). |
Administrative Area Level 1* | String | First-order civil entity below the country level. Within the United States, these administrative levels are states. Not all countries exhibit these administrative levels. |
Administrative Area Level 2* | String | Second-order civil entity below the country level. Within the United States, these administrative levels are counties. Not all countries exhibit these administrative levels. |
Administrative Area Level 3* | String | Third-order civil entity below the country level. Not all countries exhibit these administrative levels. |
Airport | String | Indicates an airport. NOTE: This output is deprecated. |
Country* | String | The national political entity. |
East Bound Longitude | String | Bounding box eastern limit. |
Floor* | String | Indicates the floor of a building address. |
Formatted Address | String | Human-readable version of the geocoded address. |
Intersection | String | Major intersection, usually of two major roads. NOTE: This output is deprecated. |
Latitude | String | Latitude of the address. |
Locality* | String | Incorporated city or town political entity. |
Longitude | String | Longitude of the address. |
Natural Feature* | String | Prominent natural feature. |
Neighborhood* | String | Named neighborhood. |
North Bound Latitude | String | Bounding box northern limit. |
Park* | String | Named park. |
Point of Interest* | String | Named point of interest. |
Post Box* | String | Specific postal box. |
Postal Code* | String | Postal code as used to address postal mail within the country. |
Premise* | String | Named location, usually a building or collection of buildings with a common name. |
Quality | String | The value of an Address Quality element defines the granularity of the location described by an address. Should return a value that expresses this quality between 0 and 100 (100 being the best quality) |
Room* | String | The room of a building address. |
Route* | String | Named route (such as |
South Bound Latitude | String | Bounding box southern limit. |
Status | String | Status of the request. |
Street Address | String | Precise street address. NOTE: This output is deprecated. |
Street Number* | String | Precise street number. |
Sub-Locality* | String | First-order civil entity below a locality. |
Sub-Premise* | String | First-order entity below a named location, usually a singular building within a collection of buildings with a common name. |
West Bound Longitude | String | Bounding box western limit. |
Embedded a Google Map in a Form
The Google Geocoding service data must be used to display maps rendered with the Google Maps service.
You can display such a map in Semarchy xDM in a form, by embedding generated HTML and JavaScript.
Create a new form field with the SemQL expression given below.
In the SemQL expression, modify the following line to concatenate your address information:
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";
If you are a Google Maps API for Work customer, modify in the code the URL to the Google maps service to include your Google Client ID. Note that the embedded map will stop working after adding the client ID. You must register authorized URLs with Google by following the instructions given in the Google Maps API for Work site:
<script src="https://maps.googleapis.com/maps/api/js?client=YOUR_CLIENT_ID&v3.20&sensor=false"></script>
Edit the field:
In the Display Properties, Set the Component Type to Object, and in Data, set the Source Type to Content.
This configuration tells Semarchy xDM to interpret this code as HTML and JavaScript on the browser.
'<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://maps.googleapis.com/maps/api/js?sensor=false"></script>
<script>
/* Modify the line below */
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";
var zoom = 18;
var mapType = google.maps.MapTypeId.ROADMAP;
var useMarker = true;
var map;
function initialize() {
var geocoder = new google.maps.Geocoder();
geocoder.geocode( { "address": address}, function(results, status) {
if (status == google.maps.GeocoderStatus.OK) { displayMap(results[0].geometry.location); }
});
window.onresize = resize;
}
function displayMap(latlng) {
var mapOptions = { zoom: zoom, center: latlng, mapTypeId: mapType }
map = new google.maps.Map(document.getElementById("map_canvas"), mapOptions);
if (useMarker) {
var marker = new google.maps.Marker({ map: map, position: latlng});
}
resize("");
}
function resize(e) {
var center = map.getCenter();
map.getDiv().style.height = window.innerHeight +"px";
map.getDiv().style.width = window.innerWidth +"px";
google.maps.event.trigger(map, ''resize'');
map.setCenter(center);
}
google.maps.event.addDomListener(window, "load", initialize);
</script>
</head>
<body style="margin:0px;">
<div id="map_canvas" style="margin:0px;"></div>
</body>
</html>'
OPEN STREET MAP PLUG-IN
The OpenStreetMap Plug-in for Semarchy xDM uses the OpenStreetMap API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address.
OpenStreetMap Enricher
Plug-in ID
OpenStreetMap Enricher - com.semarchy.engine.plugins.openstreetmap
Description
This enricher takes an input address, enriches and validates this postal address using the OpenStreetMap Service.
This enricher uses the OpenStreetMap Service, which must be accessible from the Semarchy xDM Application at the URL specified in the OpenStreetMap URL parameter. Make sure to make this URL accessible through your firewalls. |
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
OpenStreetMap URL | Yes | String | URL used to query OpenStreetMap API. Typically |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Address | String | Complete address of the location. |
City | String | City of the location. |
Country | String | Country of the location. |
Country Code | String | Country code of the location. |
County | String | County of the location. |
Latitude | String | Latitude of the location. |
Longitude | String | Longitude of the location. |
Postal Code | String | Postal code of the location. |
Process Code | String | Code that indicates the result status of the address processing. |
State | String | State of the Location. |
Street Number | String | Street number of the location. |
Street Name | String | Street name of the location. |
MICROSOFT BING MAPS PLUG-IN
The Microsoft Bing Maps Plug-in for Semarchy xDM uses the Bing Location API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address with geocoding information.
Bing Maps Enricher
Plug-in ID
Google Bing Enricher - com.semarchy.engine.plugins.bing.address
Description
This enricher takes an input address, enriches and validates this postal address using the Bing Maps Service.
This plug-in must be used in compliance with the Microsoft Bing Maps APIs Terms of Service. |
This enricher uses the Bing Maps Service, which must be accessible from the Semarchy xDM Application at the URL specified in the Bing Location URL parameter. Make sure to make this URL accessible through your firewalls. |
This plug-in is thread-safe and supports parallel execution. |
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Bing Maps Key | Yes | String | To use the Bing Maps Services, you must have a Bing Maps Key. |
Bing Location URL | Yes | String | This URL will be used to query Bing Location API. |
Plug-in Inputs
The following table lists the plug-in inputs.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input. |
Plug-in Outputs
The following table lists the plug-in outputs.
Parameter Name | Type | Description |
---|---|---|
Administrative District | String | The subdivision name within the country or region for an address, such as the abbreviation of a US state. |
Administrative District 2 | String | The subdivision name within the administrative district for an address. |
Confidence | String | Defines the confidence of the location match found by the geocoding service. Possible values: High, Medium, Low. |
Country or Region | String | The country or region name of the address. |
Formatted Address | String | A string specifying the complete address. This address may not include the country or region. |
Status Code | String | The HTTP Status code for the request. |
Status Description | String | A description of the HTTP status code. |
Latitude | String | Latitude of the location. |
Locality | String | The locality, such as the primary city, that corresponds to an address. |
Longitude | String | Longitude of the address. |
Match Code | String | Defines the geocoding level of the location match found by the geocoder. One or more of the following values: Good, Ambiguous, UpHierarchy |
Postal Code | String | The city or neighborhood that corresponds to the postal code. |
Process Code | String | Code that indicates the result status of the process. |
APPENDICES
Appendix A: Semarchy Email Enricher Domain Name Cache
The Semarchy Email Enricher uses a local cache to avoid repeating MX record lookups to check the validity of an email domain.
This domain name cache is used in priority, meaning that if a record is found in the cache, the enricher will use the information available locally and we will not issue a MX record lookup.
The plug-in stores the cache in the table name EXT_EMAIL_DOMAINS
. This table is created at first run of the enricher, by default in the data location served by the enricher. You can specify a specific datasource location to store this table in the Datasource enricher parameter.
Domain Name Cache Table Structure
The structure of the EXT_EMAIL_DOMAINS
table is the following:
Column Name | Description |
---|---|
| Domain name. e.g. "gmail.com" |
| 2 first letters of the domain name. e.g. "gm" |
| 2 last letters of the domain name. e.g. "om" |
| Number of times this host name was processed by the enricher. This value is automatically incremented by the enricher. |
| Indicates whether this record was part of the seeded data, of created by the enricher. The value is |
| Indicates whether the domain name is valid |
| Latest correction found for an invalid domain. |
| Additional date information used to reconsider a domain validity after a certain period of time. |
Fixing Domain Names
The enricher automatically fixes invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:
The Edit Distance between the invalid domain and cached domain.
The hit count of the cached domain.
A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.
Adding Records to the Cache
It is possible to force the creation of new records in the cache, for example to create new fix suggestions.
To manually insert a domain correction <domain_name_replacement>
for a <domain_host_name>
invalid domain, use the following query sample:
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
<invalid_host_name>,
SUBSTR(<invalid_host_name>, 0, 2),
SUBSTR(<invalid_host_name>, -2, 2),
0,
'1',
'0',
<host_name_replacement>,
CURRENT_TIMESTAMP,
CURRENT_TIMESTAMP
);
Cache Refresh
The Email enricher refreshes the local cache records after 3 months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.
If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to see the date field to NULL so that the email enricher does not refresh the cache for that domain.
It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the 3 month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.
Last updated 2018-06-14 11:52:52 UTC