Migrating Your Backend Database from Oracle to PostgreSQL
This document gives the requirements and best practices for changing the technology of your backend database from Oracle to PostgreSQL.
Requirements & Warning
Information Not Migrated
This migration method does not preserve:
- The entire repository contents, and more specifically the model history. With this method, the model editions other than the exported one are lost.
- Load, batches, job, workflow and stepper currently running instances and completed instances. Make sure not to have running loads, batches, jobs, workflows and steppers for the data location.
A full migration of the entire repository and data location data across database backends is not supported.
Phased Upgrade and Migration
Do not plan a backend migration along with an upgrade of Semarchy xDM. Upgrade first your Semarchy xDM infrastructure to the latest xDM release, make sure that xDM as well as your models are fully functional with this release and then perform a backend database migration.
xDM Versions
Cross version migration is not supported. You cannot export a model from an xDM version and import it into another version.
It is not recommended to mix database backend technologies. The database technology used for the repository and the data locations should be the same.
Model Migration Process
To migrate from Oracle to PostgreSQL:
- Create a new repository in PostgreSQL.
- Manually re-create the platform artifacts: roles, plugins, notification servers, variable value providers, image library, etc in the PostgreSQL repository.
- Export and import the latest model edition from the Oracle repository to the new PostgreSQL repository as a new model.
- Change the model's Target Database to PostgreSQL.
- Fix the model
- Review User-Defined Functions
- Oracle uses the notion of package which does not exist in PostgreSQL. Remove the Package for all functions.
- Use model validation to raise immediate syntactic issues (different functions or syntax elements)
- Fix syntactic issues raised by the validation.
- Review all SemQL expressions:
- Built-in functions behaviour may differ (e.g: SEM_EDIT_DISTANCE).
- Implicit type conversion: Oracle does implicit type conversion, PostgreSQL does not.
- Review possible SQL expressions:
- SQL Hooks defined into Jobs.
- Model variables using a database value providers
- Lookup Plugin select statement
- Review User-Defined Functions
- Deploy this model in a new data location
- Transfer the data from the Oracle schema to the PostgreSQL database using you ETL or middleware.
- Load all data from the Oracle data location tables (SD, MD, .... and DL_ tables). Make sure to exclude the DL_DATA_LOC table from this process.
For this phase, pay attention to the Datatype Mapping differences. - Set the SEQ_* SSQ_* sequences in the PostgreSQL data location to the current Oracle values.
- Set the SEQ_BATCHID and SEQ_LOADID in the PostgreSQL repository to the values from the Oracle repository.
- Perform adjustments
- PL/SQL Functions used by the model should be converted to PL/pgSQL into the equivalent PostgreSQL location (PostgreSQL data location for example).
- Load all data from the Oracle data location tables (SD, MD, .... and DL_ tables). Make sure to exclude the DL_DATA_LOC table from this process.
- Rewrite or review ETL jobs to point to the PostgreSQL hub.
- Test the entire data hub:
- Inbound integration using the SQL and REST API, as well as data certification process results.
- Application behavior when browsing, authoring data and managing duplicates.
- Outbound Integration using SQL or REST API.
Noticeable Database Differences
This section lists differences in the database that you should review for the migration process.
Datatype Mapping
The following table lists the datatype mapping and differences between Oracle and PostgreSQL.
Datatypes not listed below have the same behaviour in Oracle and PostgreSQL.
Semarchy Datatype | Oracle Datatype | PostgreSQL Datatype | Comments |
---|---|---|---|
Boolean | CHAR(1) | BOOLEAN | |
String | VARCHAR2(n CHAR) | VARCHAR | See the Null Handling note below. |
UUID | RAW(16) | UUID | |
Binary | BLOB | BYTEA | |
LongText | CLOB | TEXT | |
Date | DATE | DATE | Date in Oracle stores a time component which is automatically removed by the integration job. |
Null Handling
For Oracle, an empty string is equal to a null. PostgreSQL makes the difference between a null string and an empty string.
The integration job takes care of empty strings and convert them to null values. However, null or empty strings may be created by other means (steppers, PL/SQL triggers, etc).
SemQL expressions, PL/SQL functions or SQL Hooks handling strings assuming that empty string is implicitly null should handle this explicitly.
Examples of NULL handling
Oracle | Postgres | Postgres | SemQL (Oracle, PostgreSQL) |
---|---|---|---|
SELECT 'Something plus' || NULL FROM dual; Something plus | SELECT 'Something plus' || NULL; [NULL] | select concat('Something plus', null); Something plus | CONCAT( 'Something plus', null ) Something plus |
Reference Information
The following link provides reference Information to convert Oracle to PostgreSQL.
Customer feedback
A French customer decided to move from Oracle to Postgres in their AWS environment.
Migrating the model from Oracle to Postgres was easy, they only had a couple of validation errors and few PL/SQL functions.
The challenge came from the data to transfer from PROD Oracle RDS to new PROD Postgre RDS.
The whole data location represents 65 million rows (counting all tables).
Our official answer on data transfer is to use an ETL and write mappings for all tables but in this case we used DI/Stambia and a template written by Hilaire Godinot (Deactivated) that generates the insert / select from an oracle schema to a postgres schema, applying correct semarchy type casts (for booleans, UUIDs...) and also update sequences.
After tuning and optimizing the template and the servers, the actual transfer time was 40 minutes (all tables, sequences and indexes rebuild).
This is the first metric we have for an oracle to postgres migration, on a real customer project with a decent amount of data.
Technical details below
Instances
ec2 t3.2xlarge (8CPU/32GRAM)
postgres RDS r5.2xlarge (8CPU/64GRAM)
oracle RDS db.m5.2xlarge (8CPU/32GRAM)
The DI runtime was installed on the ec2 instance where Semarchy runs, in the same region as both RDS source and target instances with only 4G of RAM.
Last RUN parameters : batch/fetch size 30k + disable all indexes except pk + 7 parallel threads
Data transfer = 36 minutes
Deploy model to recreate and rebuild indexes = 3mins
Total time 39mins
Production migration scenario
- Make sure all servers use the same semarchy version
- Stop Semarchy (tomcat) on postgres
- Make sure postgre_mdm schema contains table structures and rows only in DL_ tables (drop / recreate schema and deploy model if needed)
- Manually drop indexes before initial load (prepared script)
- Stop continuous load / execution engine on mdm_oracle
- Run DI template to load data from oracle to postgres (data + sequences dataloc)
- Stop Semarchy on oracle
- Align repository sequences from oracle to postgres
- Start Semarchy on postgres and deploy model to recreate and rebuild indexes
- Restore delta from oracle to postgres (SD to SD with prepared ETL script, group by load)
- Stop RDS / ec2 instances for oracle and reuse elastic ip / name for postgres instances or edit DNS