Refresh an MDM Development environment with Production Data

When a user wants to refresh his development environment with production data, the best practice is to dump the data location schema from production (PROD) and load it in the development (DEV) environment.


Warning Repository

This Refresh from Production to Development guide is NOT for the repository. Again, it is NOT for the repository. Do not copy your repository from Production to Development! 


Step-by-step guide

  1. Make sure that the model editions in DEV and PROD are compatible.
    (Model versions will never match, since dev is always ahead of prod. The requirements are that the physical tables are compatible (same columns, same data types, etc.) 
  2. Stop Tomcat. Do not submit any new loads. Stopping Tomcat is critical - not just a nice-to-have precaution. Keeping Tomcat running and dropping the DEV data location will cause a jdbc datasource not bound in context type of error and restarting Tomcat was not able to easily fix it. If this happens - you may have to delete and recreate the data location in the Workbench. So be warned!
  3. Backup the DEV data location schema.
  4. Make a note of the data location's UUID.

    Save the existing UUID
    select UUID from DEV_MDM.DL_DATA_LOC;
    /* Sample result: 09DE459BA8934FAE92A217B60A263EDD 
     * You will need this value in step #8
     */
  5. Backup the PROD data location schema to a file you will use in the next step.
    Consult your DBA for specifics to your implementation.
    In Oracle: this is likely to be done using Oracle's data pump, but there are other mechanisms for backup which could be equally good.
    In PostgreSQL: this is typically done with pg_dump.
    In Microsoft SQL Server: this is typically done using BACKUP DATABASE.
    Consult your DBA for specifics to your implementation.

  6. Drop and re-create DEV schema

    Drop and re-create DEV data location schema
    /* This is an Oracle example. Your exact commands will be different. But this sample highlights the key ideas. */
    DROP USER DEV_MDM CASCADE;
    CREATE USER DEV_MDM IDENTIFIED BY DEV_MDM DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
    GRANT CONNECT, RESOURCE TO DEV_MDM;
  7. Load the PROD schema backup into the DEV database schema.
    Note: Depending on the details of your environment, you may need to re-grant permissions on the objects in this schema. 

  8. Update DL_DATA_LOC to use the previous UUID value.

    Restore the UUID
    update DEV_MDM.DL_DATA_LOC set UUID = '09DE459BA8934FAE92A217B60A263EDD';
    /* Use the value you saved from step 4 for the UUID */
  9. Make sure that the loadId and batchid sequences in the repository are greater than the max loadid and batchid in DL_BATCH

    Confirm that the loadid and batchid sequences are large enough
    select
      DEV_REPO.SEQ_LOADID.nextval + 1 as next_dev_loadid,
      PROD_REPO.SEQ_LOADID.nextval + 1 as next_prod_loadid,
      DEV_REPO.SEQ_BATCHID.nextval + 1 as next_dev_batchid,
      PROD_REPO.SEQ_BATCHID.nextval + 1 as next_prod_batchid,
      CASE 
        WHEN (
              DEV_REPO.SEQ_LOADID.nextval  >= PROD_REPO.SEQ_LOADID.nextval
          AND DEV_REPO.SEQ_BATCHID.nextval >= PROD_REPO.SEQ_BATCHID.nextval
        ) THEN 'IDs are good. No action needed.' 
        ELSE 'You must manually increase the nextval of SEQ_LOADID or SEQ_BATCHID' 
      END as message
    from dual;
    /* In cases where your DEV and PROD environments are physically separated, you must modify this query to gather the DEV and PROD values separately. */


    If the sequence current value is less than the max in DL_BATCH then update it to the max. This will ensure that you don't start a load or a batch using an id that has already been reserved.

  10. Restart Tomcat. You may resume submitting data loads.

Conclusion

The DEV data location now contains your production data. It's ready for testing new matching rules or for other tests.


Additional Notes

Important Note

We only transferred the information stored in the data location schema, so nothing related to the repository schema was transferred. In your refreshed DEV environment you have not transferred the following from PROD (because these things are stored in the REPO):

  • Logs showing data loads 
  • Logs showing integration batches
  • Workflow statuses
Update Sequences
/* Option 1. Benefit of this version: simpler.  */
DROP SEQUENCE SEQ_LOADID;
CREATE SEQUENCE SEQ_LOADID START WITH <old_sequence_val noted in step 9>;
DROP SEQUENCE SEQ_BATCHID;
CREATE SEQUENCE SEQ_BATCHID START WITH <old_sequence_val noted in step 9>;
/* Do not forget to re-grant permissions as needed. */

 
/* Option 2. Benefit of this alternate version: the sequence does not get dropped, so its permissions remain unchanged. */ 
ALTER SEQUENCE REFRESH_DEV_REPOSITORY.SEQ_LOADID INCREMENT BY <old_sequence_val noted in step 9, by difference needed for DEV to exceed PROD's LoadID>;
select REFRESH_DEV_REPOSITORY.SEQ_LOADID.nextval from dual;
ALTER SEQUENCE REFRESH_DEV_REPOSITORY.SEQ_LOADID INCREMENT BY 1;
select REFRESH_DEV_REPOSITORY.SEQ_LOADID.nextval from dual;

ALTER SEQUENCE REFRESH_DEV_REPOSITORY.SEQ_BATCHID INCREMENT BY <old_sequence_val noted in step 9, by difference needed for DEV to exceed PROD's BatchID>;
select REFRESH_DEV_REPOSITORY.SEQ_BATCHID.nextval from dual;
ALTER SEQUENCE REFRESH_DEV_REPOSITORY.SEQ_BATCHID INCREMENT BY 1;
select REFRESH_DEV_REPOSITORY.SEQ_BATCHID.nextval from dual;
/* No need to re-grant permissions because the sequences were not dropped. */
 

The DEV data location is still a DEV data location (e.g. models can be updated in place), and the PROD data location is still a PROD data location (e.g. only closed models can be deployed) because this information is stored in the repository.

Confirm the data location type
select NAME, LABEL, LOCATION_TYPE from DEV_REPO.MTA_DATA_LOCATION;
select NAME, LABEL, LOCATION_TYPE from PROD_REPO.MTA_DATA_LOCATION;



Sample database commands
/* imp is old. impdp is more recent and better. But this sample may still be enlightening. */
exp REFRESH_DEV_DEMO_MDM/REFRESH_DEV_DEMO_MDM@//semdbinstance.cdm38ics3lso.us-west-1.rds.amazonaws.com:1521/semdb OWNER=REFRESH_DEV_DEMO_MDM file=d:\temp\expREFRESH_DEV_DEMO_MDM.dmp
exp REFRESH_PROD_MDM/REFRESH_PROD_MDM@//semdbinstance.cdm38ics3lso.us-west-1.rds.amazonaws.com:1521/semdb OWNER=REFRESH_PROD_MDM file=d:\temp\expREFRESH_PROD_MDM.dmp

imp REFRESH_DEV_DEMO_MDM/REFRESH_DEV_DEMO_MDM@//semdbinstance.cdm38ics3lso.us-west-1.rds.amazonaws.com:1521/semdb FROMUSER=REFRESH_PROD_MDM TOUSER=REFRESH_DEV_DEMO_MDM file=d:\temp\expREFRESH_PROD_MDM.dmp

/* Using imp, you may encounter the following error */
IMP-00017: following statement failed with ORACLE error 2304:
 "CREATE TYPE "SEM_SPLIT_TBL" TIMESTAMP '2015-05-29:21:56:29' OID '17408C6561"
 "9310D9E050007F0100410B'   as table of varchar2(32767 BYTE);"
IMP-00003: ORACLE error 2304 encountered
ORA-02304: invalid object identifier literal

/* The solution to error 2304 in this case is to manually re-create the TYPE. */
create or replace type SEM_SPLIT_TBL as table of varchar2(32767 BYTE);
 
/* When using impdp, the problem can be prevented using the following option. */
TRANSFORM=oid:n