Slow performance in data steward application
How to troubleshoot when end users complain that the data steward application is very slow - including taking more than 10 seconds to open a Business View (BV), taking more than 1 minute to check out records in a duplicate management workflow, etc.
Slow BVs / Slow navigating through the data
Step 1: Collect information
- Turn on logging for the data steward application.
- Gather benchmarks metrics to track what is slow. Time however long it takes to perform the task that the customer complains is sluggish.
- For example, pick a BV. Open the golden data view only. See error log for how long that task took and retrieve the SQL that ran.
- Run the SQL in SQL Developer. Does it take the same amount of time to execute as it does in Semarchy?
- Get the an explain plan.
Top culprit for slow performance in a Collection: Sorting
Sort is a highly suspicious component that might be causing bad performance. Sorting requires ordering and the order by is a very expensive task that can cause the application to slow down significantly. The use of an index can have an enormous beneficial impact.
Oracle example: Look at the "Sort order by stopkey" cost. Does it seem like a higher cost than the other tasks (like joins) in the explain plan?
PostgreSQL and MS SQL Server have different syntax, but the same concept applies.
To test, remove the order by statement. Execution now takes much less than 10 seconds? You are probably experiencing an issue with sorting.
Top culprit for slow performance in a Form: Embedded Collections
Embedded collections require many queries.
Step 2: Troubleshooting in Semarchy
- Open the BV in the workbench. Under "Collection Configuration", look at Customized Sort. Are there any attributes sorted? If yes, remove all attributes. This should increase speed of application because the BV will now rely on default sorting based on the ID. This will normally be faster because it is indexed.
Alternative: It's possible to manually add an index in the database schema. If your users have an important business need for the Customized Sort, then keep the sort. And add an index to provide good performance for this sort. - Open the BV in the workbench. Under "Collection Configuration", look at "User-Defined Sort". Is user-defined sorting allowed? If so, it's possible for users to set sorts that may perform poorly.
Option 1: disallow user-defined sorts.
Option 2: educate users that default sorting is fast, but user-defined sorting may be slow. Use it only on filtered data. - If there's still bad performance or even a Null Pointer Exception (NPE) when opening BVs, sort preferences might have been corrupted. Go to User Profile → Settings. Use the 'clear' option to remove "Selected columns, sort order and type of view used" Getting rid of these saved preferences usually solves the problem.
- (Legacy 3.x reference) Under Preferences → Data Stewardship, uncheck Predictable Pagination.
Dupes Mgmt Workflow takes a long time
Step 1: Troubleshooting in Semarchy
Check there is no Customized Sort
- Best practice: Should not check out more than 1K records in basket for dupes mgmt. Tweak match rules to autoconfirm records if a data steward is checking out more than 1K records to blindly confirm them.
Good idea to remove records from basket. Don’t just cancel out of workflows. Actually Select All and hit Delete. If you only cancel out of workflows and do not actually delete the records, you end up having large amounts of data in UG and UM tables. This potentially hurts performance.
- (Legacy 3.x reference) Under display settings, uncheck Autofit columns, Colorize consolidated Master values, Colors Golden IDs.
Stepper or workflow takes a long time to finish processing integration job
You may have Calculate statistics turned on. Check out this Confluence article for details.