This document gives the requirements and best practices for changing the technology of your backend database from Oracle to PostgreSQL.

Requirements & Warning

Information Not Migrated

This migration method does not preserve:

The entire repository contents, and more specifically the model history. With this method, the model editions other than the exported one are lost.
Load, batches, job, workflow and stepper currently running instances and completed instances. Make sure not to have running loads, batches, jobs, workflows and steppers for the data location.

A full migration of the entire repository and data location data across database backends is not supported.

Phased Upgrade and Migration

Do not plan a backend migration along with an upgrade of Semarchy xDM. Upgrade first your Semarchy xDM infrastructure to the latest xDM release, make sure that xDM as well as your models are fully functional with this release and then perform a backend database migration.

xDM Versions

Cross version migration is not supported. You cannot export a model from an xDM version and import it into another version.

It is not recommended to mix database backend technologies. The database technology used for the repository and the data locations should be the same.

Model Migration Process

To migrate from Oracle to PostgreSQL:

Create a new repository in PostgreSQL.
Manually re-create the platform artifacts: roles, plugins, notification servers, variable value providers, image library, etc in the PostgreSQL repository.
Export and import the latest model edition from the Oracle repository to the new PostgreSQL repository as a new model.
Change the model's Target Database to PostgreSQL.
Fix the model
1. Review User-Defined Functions
  1. Oracle uses the notion of package which does not exist in PostgreSQL. Remove the Package for all functions.
2. Use model validation to raise immediate syntactic issues (different functions or syntax elements)
  1. Fix syntactic issues raised by the validation.
3. Review all SemQL expressions:
  1. Built-in functions behaviour may differ (e.g: SEM_EDIT_DISTANCE).
  2. Implicit type conversion: Oracle does implicit type conversion, PostgreSQL does not.
4. Review possible SQL expressions:
  1. SQL Hooks defined into Jobs.
  2. Model variables using a database value providers
  3. Lookup Plugin select statement
Deploy this model in a new data location
Transfer the data from the Oracle schema to the PostgreSQL database using you ETL or middleware.
1. Load all data from the Oracle data location tables (SD, MD, .... and DL_ tables). Make sure to exclude the DL_DATA_LOC table from this process.
  For this phase, pay attention to the Datatype Mapping differences.
2. Set the SEQ_* SSQ_* sequences in the PostgreSQL data location to the current Oracle values.
3. Set the SEQ_BATCHID and SEQ_LOADID in the PostgreSQL repository to the values from the Oracle repository.
4. Perform adjustments
  1. PL/SQL Functions used by the model should be converted to PL/pgSQL into the equivalent PostgreSQL location (PostgreSQL data location for example).
Rewrite or review ETL jobs to point to the PostgreSQL hub.
Test the entire data hub:
1. Inbound integration using the SQL and REST API, as well as data certification process results.
2. Application behavior when browsing, authoring data and managing duplicates.
3. Outbound Integration using SQL or REST API.

Noticeable Database Differences

This section lists differences in the database that you should review for the migration process.

Datatype Mapping

The following table lists the datatype mapping and differences between Oracle and PostgreSQL.
Datatypes not listed below have the same behaviour in Oracle and PostgreSQL.

Semarchy Datatype	Oracle Datatype	PostgreSQL Datatype	Comments
Boolean	CHAR(1)	BOOLEAN
String	VARCHAR2(n CHAR)	VARCHAR	See the Null Handling note below.
UUID	RAW(16)	UUID
Binary	BLOB	BYTEA
LongText	CLOB	TEXT
Date	DATE	DATE	Date in Oracle stores a time component which is automatically removed by the integration job.

Null Handling

For Oracle, an empty string is equal to a null. PostgreSQL makes the difference between a null string and an empty string.
The integration job takes care of empty strings and convert them to null values. However, null or empty strings may be created by other means (steppers, PL/SQL triggers, etc).
SemQL expressions, PL/SQL functions or SQL Hooks handling strings assuming that empty string is implicitly null should handle this explicitly.

Examples of NULL handling

Oracle

Postgres

SemQL (Oracle, PostgreSQL)

SELECT 'Something plus' || NULL FROM dual;

Something plus

SELECT 'Something plus' || NULL;

[NULL]

select concat('Something plus', null);

Something plus

CONCAT( 'Something plus', null )

Something plus

Reference Information

The following link provides reference Information to convert Oracle to PostgreSQL.

Oracle to PostgreSQL Conversion

Customer feedback

A French customer decided to move from Oracle to Postgres in their AWS environment.

Migrating the model from Oracle to Postgres was easy, they only had a couple of validation errors and few PL/SQL functions.

The challenge came from the data to transfer from PROD Oracle RDS to new PROD Postgre RDS.

The whole data location represents 65 million rows (counting all tables).

Our official answer on data transfer is to use an ETL and write mappings for all tables but in this case we used DI/Stambia and a template written by Hilaire Godinot (Deactivated) that generates the insert / select from an oracle schema to a postgres schema, applying correct semarchy type casts (for booleans, UUIDs...) and also update sequences.

After tuning and optimizing the template and the servers, the actual transfer time was 40 minutes (all tables, sequences and indexes rebuild).

This is the first metric we have for an oracle to postgres migration, on a real customer project with a decent amount of data.

Technical details below

Instances

ec2 t3.2xlarge (8CPU/32GRAM)

postgres RDS r5.2xlarge (8CPU/64GRAM)

oracle RDS db.m5.2xlarge (8CPU/32GRAM)

The DI runtime was installed on the ec2 instance where Semarchy runs, in the same region as both RDS source and target instances with only 4G of RAM.

Last RUN parameters : batch/fetch size 30k + disable all indexes except pk + 7 parallel threads
Data transfer = 36 minutes
Deploy model to recreate and rebuild indexes = 3mins
Total time 39mins

Production migration scenario

Make sure all servers use the same semarchy version
Stop Semarchy (tomcat) on postgres
Make sure postgre_mdm schema contains table structures and rows only in DL_ tables (drop / recreate schema and deploy model if needed)
Manually drop indexes before initial load (prepared script)
Stop continuous load / execution engine on mdm_oracle
Run DI template to load data from oracle to postgres (data + sequences dataloc)
Stop Semarchy on oracle
Align repository sequences from oracle to postgres
Start Semarchy on postgres and deploy model to recreate and rebuild indexes
Restore delta from oracle to postgres (SD to SD with prepared ETL script, group by load)
Stop RDS / ec2 instances for oracle and reuse elastic ip / name for postgres instances or edit DNS

Semarchy Knowledge Base

Migrating Your Backend Database from Oracle to PostgreSQL

Requirements & Warning

Model Migration Process

Noticeable Database Differences

Datatype Mapping

Null Handling

Examples of NULL handling

Reference Information

Customer feedback

Technical details below

Related content