Export Semarchy xDM Integration Batch Logs

Problem

Client is experiencing problems with the throughput of an integration batch job. The certification process is taking too long. While investigating performance problems with the certification process, it can be useful to have an offline reviewable copy of all of the steps in the job with their details like duration and executed code.

Solution

This article describes how to capture log details for integration batches in order to identify the duration of the individual steps in the certification process. The query provided works for both Semarchy xDM (v4.x) and Semarchy Convergence for MDM 3.x. The query is attached as a text file at the end of the article.


Summary

The steps to query the log details for a specified batch are as follows:

  1. Identify an integration batch for investigation
  2. Edit the name of the Semarchy Repository schema in the provided script
  3. Run the script using Oracle SQL Developer (or pgAdmin for PostgreSQL)
  4. Save/Export the results (to Excel, for example)


Description of Steps

Step 1: Identify an integration batch for investigation

First, identify the batch to be analyzed. Go into the workbench and choose 'Data Locations'. In the Data Locations tree on the left, open the 'Latest Integration Batches' folder in the appropriate data location. For example, the BATCHID for the integration batch which ran on Thu Jan 25 is 1347


Here is a video to guide you on how to find the load IDs and batch IDs in your integration logs:


Step 2: Edit the name of the Semarchy Repository schema in the provided script

In the provided script, the semarchy repository schema is called SEMARCHY_REPO. Change all 11 occurrences of this to the schema name for your repository.


Step 3: Run the script using Oracle SQL Developer (or pgAdmin for PostgreSQL)

For Oracle, SQL Developer prompts for a value for BATCHID. Enter the value identified in Step 1, or enter NULL to return ALL logs (beware this can be a very large result set).

For PostgreSQL, bind variables are not supported by pgAdmin4, so you must edit the query before executing. Comment out line 48, replace '<batchid>  in line 49 with your specified batchid, and uncomment line 49.


Step 4: Save/Export the results (to Excel, for example)


Use this query to return logging. Make sure to export ALL the rows (estimated about 2000 rows as of April 2020) because SQL Developer may only present the first 200 rows based on your Preferences.

Integration Batch Logs SQL query for Oracle
-- Query to return logging details in Semarchy xDM (and Semarchy Convergence for MDM 3.x)
-- This query returns details of an integration batch for a specified Batch ID.
-- Note the single bind variable: :BATCHID
-- SQL*Developer prompts for a bound variable value when you execute this query.
-- pgAdmin4 and other tools do not support bind variables, so you must edit the query before executing:
--   Replace ":BATCHID" in line 41 with your specified batchid
 
with LOGS as (
    select
        BATCHID
        ,JI.NAME JOB_INSTANCE_NAME
        ,JL.START_DATE JOB_START_DATE
        ,JL.END_DATE JOB_END_DATE
        ,JL.ROW_COUNT JOB_ROW_COUNT
        ,JL.ERR_COUNT JOB_REJECTS_COUNT
        ,JL.STATUS JOB_STATUS /* RUNNING, DONE, WARNING, ERROR */
        ,JL.MESSAGE JOB_MESSAGE
        ,TL1.NAME TASK_NAME1
        ,TL2.NAME TASK_NAME2
        ,TL3.NAME TASK_NAME3
        ,TL4.NAME TASK_NAME4
        ,TL1.MESSAGE TASK_MESSAGE1
        ,coalesce(TL4.ROW_COUNT, TL3.ROW_COUNT, TL2.ROW_COUNT, TL1.ROW_COUNT) TASK_ROW_COUNT
        ,coalesce(TL4.START_DATE, TL3.START_DATE, TL2.START_DATE, TL1.START_DATE) TASK_START_DATE
        ,coalesce(TL4.END_DATE, TL3.END_DATE, TL2.END_DATE, TL1.END_DATE) TASK_END_DATE
        ,coalesce(TL4.STATUS, TL3.STATUS, TL2.STATUS, TL1.STATUS) TASK_STATUS
        ,coalesce(TL4.MESSAGE, TL3.MESSAGE, TL2.MESSAGE, TL1.MESSAGE) TASK_MESSAGE
        ,coalesce(TD4.SQL, TD3.SQL, TD2.SQL, TD1.SQL) TASK_SQL
    from
        SEMARCHY_REPO.MTA_INTEG_BATCH B
        inner join SEMARCHY_REPO.MTA_JOB_INSTANCE JI on ( B.R_JOBINSTANCE = JI.UUID)
        inner join SEMARCHY_REPO.MTA_JOB_LOG JL on ( JI.UUID = JL.R_JOBINSTANCE)
        inner join SEMARCHY_REPO.MTA_TASK_LOG TL1 on ( JL.UUID = TL1.O_JOBLOG and TL1.R_TASKLOG is null)
        left join SEMARCHY_REPO.MTA_TASK_LOG TL2 on ( TL1.UUID = TL2.R_TASKLOG )
        left join SEMARCHY_REPO.MTA_TASK_LOG TL3 on ( TL2.UUID = TL3.R_TASKLOG )
        left join SEMARCHY_REPO.MTA_TASK_LOG TL4 on ( TL3.UUID = TL4.R_TASKLOG )
        left join SEMARCHY_REPO.MTA_TASK_DEF TD1 on ( TD1.UUID = TL1.R_TASKDEF )
        left join SEMARCHY_REPO.MTA_TASK_DEF TD2 on ( TD2.UUID = TL2.R_TASKDEF )
        left join SEMARCHY_REPO.MTA_TASK_DEF TD3 on ( TD3.UUID = TL3.R_TASKDEF )
        left join SEMARCHY_REPO.MTA_TASK_DEF TD4 on ( TD4.UUID = TL4.R_TASKDEF )
    where B.BATCHID = :BATCHID /* SQLDeveloper will prompt for a batch ID. In other tools, replace :BATCHID with your batch ID before executing. */
)
select
    BATCHID, JOB_INSTANCE_NAME, JOB_START_DATE, JOB_END_DATE,
    JOB_ROW_COUNT, JOB_REJECTS_COUNT, JOB_STATUS, JOB_MESSAGE,
    TASK_NAME1, TASK_NAME2, TASK_NAME3, TASK_NAME4,
    TASK_ROW_COUNT, TASK_START_DATE, TASK_END_DATE,
    to_char(extract(hour from TASK_END_DATE - TASK_START_DATE), 'fm00') || ':' ||
    to_char(extract(minute from TASK_END_DATE - TASK_START_DATE), 'fm00') || ':' ||
    to_char(extract(second from TASK_END_DATE - TASK_START_DATE), 'fm00.000') TASK_DURATION,
    TASK_STATUS,
    TASK_MESSAGE,
    TASK_SQL
from LOGS
order by TASK_START_DATE ;



Use this query for PostgreSQL. Make sure to export ALL the rows (not the first 200, depending on your SQL Client Preferences)

Integration Batch Logs SQL query for PostgreSQL
-- Query to return logging details in Semarchy xDM (and Semarchy Convergence for MDM 3.x)
-- This query returns details of an integration batch for a specified Batch ID.
-- Note the commented placeholder: --BatchIDPlaceholder
-- You must edit the query with your specified batchid before executing
-- This query works only with PostgreSQL. Oracle users should use the Oracle version of this query.

with recursive LOGS as (
  select 
    1::int as LVL, 
    TD.NAME as TASK_GROUP1, 
    NULL as TASK_GROUP2, 
    NULL as TASK_GROUP3,
    B.BATCHID, 
    JL.START_DATE as JOB_START_DATE, 
    JL.END_DATE as JOB_END_DATE, 
    JL.ROW_COUNT as JOB_ROW_COUNT, 
    JL.ERR_COUNT as JOB_REJECTS_COUNT, 
    JL.STATUS as JOB_STATUS, 
    JL.MESSAGE as JOB_MESSAGE,
    JD.NAME as JOB_NAME, 
    TD.NAME as TASK_NAME, 
    TD.CLASSNAME, 
    TD.SQL as TASK_SQL,
    TL.START_DATE as TASK_START_DATE, 
    TL.END_DATE as TASK_END_DATE, 
    TL.ROW_COUNT as TASK_ROW_COUNT, 
    TL.UUID, 
    TL.R_TASKLOG, 
    TL.ORD_POS, 
    TL.MESSAGE as TASK_MESSAGE, 
    TL.STATUS as TASK_STATUS
  from         MTA_TASK_LOG     TL
    inner join MTA_TASK_DEF     TD on ( TD.UUID = TL.R_TASKDEF )
    inner join MTA_JOB_DEF      JD on ( JD.UUID = TD.O_JOBDEF)
    inner join MTA_JOB_LOG      JL on ( JL.UUID = TL.O_JOBLOG )
    inner join MTA_JOB_INSTANCE JI on ( JI.UUID = JL.R_JOBINSTANCE )
    inner join MTA_INTEG_BATCH   B on ( B.R_JOBINSTANCE  = JI.UUID )  
  where B.BATCHID = --BatchIDPlaceholder
    and R_TASKLOG is null /* equivalent to "start with" part in Oracle connect_by query */
union all
  select 
    (parent_task.LVL + 1), 
    parent_task.TASK_GROUP1, 
    case when parent_task.LVL = 1 then TD.NAME when parent_task.LVL > 1 then parent_task.TASK_GROUP2 else NULL end::text, 
    case when parent_task.LVL = 2 then TD.NAME when parent_task.LVL > 2 then parent_task.TASK_GROUP3 else NULL end::text,
    B.BATCHID, 
    JL.START_DATE as JOB_START_DATE, JL.END_DATE as JOB_END_DATE, JL.ROW_COUNT as JOB_ROW_COUNT, JL.ERR_COUNT as JOB_REJECTS_COUNT, JL.STATUS as JOB_STATUS, JL.MESSAGE as JOB_MESSAGE,
    JD.NAME as JOB_NAME, 
    TD.NAME as TASK_NAME, TD.CLASSNAME, TD.SQL as TASK_SQL,
    TL.START_DATE, TL.END_DATE, TL.ROW_COUNT, TL.UUID, TL.R_TASKLOG, TL.ORD_POS, TL.MESSAGE, TL.STATUS
  from         MTA_TASK_LOG     TL
    inner join MTA_TASK_DEF     TD on ( TD.UUID = TL.R_TASKDEF )
    inner join MTA_JOB_DEF      JD on ( JD.UUID = TD.O_JOBDEF)
    inner join MTA_JOB_LOG      JL on ( JL.UUID = TL.O_JOBLOG )
    inner join MTA_JOB_INSTANCE JI on ( JI.UUID = JL.R_JOBINSTANCE )
    inner join MTA_INTEG_BATCH   B on ( B.R_JOBINSTANCE  = JI.UUID )  
    inner join LOGS parent_task on ( parent_task.UUID = TL.R_TASKLOG ) /* equivalent to "connect by" part in Oracle connect_by query */
)
select 
  BATCHID, JOB_NAME, JOB_START_DATE, JOB_END_DATE, JOB_ROW_COUNT, JOB_REJECTS_COUNT, JOB_STATUS, JOB_MESSAGE,
  LVL,
  TASK_GROUP1, TASK_GROUP2, TASK_GROUP3, 
  TASK_NAME, TASK_SQL,
  EXTRACT( EPOCH FROM ( TASK_END_DATE - TASK_START_DATE ) ) DUR_SEC,
  to_char(extract(hour from TASK_END_DATE - TASK_START_DATE), 'fm00') || ':' || 
  to_char(extract(minute from TASK_END_DATE - TASK_START_DATE), 'fm00') || ':' || 
  to_char(extract(second from TASK_END_DATE - TASK_START_DATE), 'fm00.000') DURATION,
  TASK_ROW_COUNT,
  TASK_STATUS,
  TASK_START_DATE,
  TASK_END_DATE
from LOGS
where CLASSNAME not like 'Group%' /* Task Groups are not interesting here. You can remove this line to see all logs in the xDM GUI (even ones with no executed SQL). */
order by TASK_START_DATE
;



Related articles