Problem
What should Semarchy customers do to ensure that xDM is always available? This is particularly important for customers who have a business requirement to have near real-time processing of data.
This is a two-fold problem.
1. High availability for end-users
How can we ensure that the data is always available for ETL and web services to consume as well as available for end users to browse in the data steward application?
2. High availability for processing batches
How can we ensure that the scheduler, batch poller and execution engine are always available to pick up a new job for processing data in xDM?
Solution
1. High availability for end-users
Using load balancer can achieve this requirement. One active application server will handling processing the jobs and multiple passive application servers with a load balancer will handle serving the data to end users and integration applications. If one of the passive app server dies, the load balance is responsible for distributing the request to a working passive server to answer the request.
See this documentation for more information on the Reference Architecture for High-Availability:
https://www.semarchy.com/doc/semarchy-xdm/semng.html#high-availability-configuration
2. High availability for processing batches
Unfortunately, the HTTP load balancer can only address the front-end for users and applications. We do not have a built-in solution to address high availability for xDM managing and running the certification processes.
The Semarchy customer should use a monitoring solution that checks for the health of the active server and takes action if the active server is down so it doesn't need an administrator to manually restart xDM/tomcat.
This monitoring solution can exist in multiple forms from a DIY application to enterprise-grade software application.
Enterprise Monitoring Software
Some corporate customers use an internal monitoring service, such as New Relic, AppDynamics, Ruxit and others.
AWS world
Customers can build a DIY solution with a combination of auto-scaling groups, Cloudwatch custom metrics, and homegrown scripts and alarms.
For example, VIP uses a fairly manual process to monitor the status check from AWS. Since they are behind an ELB, they can take advantage of some application health checks.
Other solutions that we haven't tried include: Mmonit.com/monit and Nagios.
Related articles