IT Monitoring / Alerting that works with Epicor

Far too many times in the last few weeks I’ve woken up to find that my phone is flooded with email and text messages that servers / Epicor / etc. are down and have been for a few hours. Or sometimes it will come from someone working on the weekend and Epicor app server went down the night before, just nobody noticed it until they were trying to work. Every time we investigate, it seems like its some new root cause (lately our new antivirus - Trend Micro Apex One - and our Unitrends backup appliances, seem to be playing a role).

This has me thinking about a different overall approach - maybe trying to implement something like Nagios or CheckMk or Zabbix to monitor our IT infrastructure and get alerts when something is down or a drive is filling up, etc. Since Epicor is the thing we need to keep running most of all, I thought I would start my research by asking here if anyone uses a tool like this, and has luck with it effectively monitoring and alerting on Epicor app servers specifically?

Thanks much for your time and for any input you can provide.

We use Site24x7 for monitoring. Haven’t tried it with Epicor directly but you can monitor a REST API response time and code. Combined with Pinging the server and monitoring SQL metrics you can probably achieve something useful

My issue is that Epicor is generally solid at staying up but speed will slow down probably in places its difficult to directly monitor such as adding sales order lines hence why i’ve never tried setting up a monitor

1 Like

We use Zabbix to monitor ALOT of our stuff. The primary reason I prefer a Zabbix installation is I can monitor ANYTHING. because I can build my own scripts to monitor things like IIS worker processes, API results, and general system stats all in the same system its my preferred way to go. Its also free so there is a bonus, but you have to write this stuff yourself. Its a project, no doubt. One I am not going to claim I have anywhere near where I want it.

The key part also is alerting. You have to have a way to break through your phones do not disturb setting. For me I use OpsGenie which can do so a variety of ways and can break through iPhone DND which historically has not been possible but Apple finally recognized the need. I would be happy to provide some guidance if you like. Shoot me a DM and we can discuss further if you like.

2 Likes

We use a sql query to monitor tasks, too many in the queue and been active in the queue for longer than 20 minutes. We found the tasks fail before the app server usually.

What if your App server and Task Agent crash but theres no tasks in the queue. Nothing gets added to the queue because there is nothing running to schedule them?

Fortunately, that scenario doesn’t happen here.
We do an iis restart during off hours followed by a task agent restart on a schedule.

I’ve been looking at using REST to do this. Shocked! Not shocked.

*** v1 REST shown, do not use for production. Use v2 with an API-Key ***

https://server/instance/api/v1/Ice.BO.AdminSessionSvc/GetSessionList

Will show all the current sessions. If app server is down, this won’t work. You can also track license usage here.

As for the TaskAgent, this call gets all Active tasks:

https://server/instance/api/v1/Ice.BO.SysTaskSvc/GetRows?whereClauseSysTask=TaskStatus%3D'Active'&whereClauseSysTaskParam=&pageSize=0&absolutePage=0

You could run one for all completed tasks and occasionally build a table of average times to run and then compare those to with the active ones to see if any are running longer than usual.

Since is all REST-based, it should be easy to hook into PowerAutomate or Logic Apps to power your notification. Also, this would work for both cloud and on-prem.

I feel like the v1 API has some security hole in it. Don’t quote me on it but I think you want to avoid exposing V1 to the internet if at all possible and only use V2. @josecgomez or @jgiese.wci would be my goto on an answer for that. Anywho I know we block all access to the v1 API at Cloudflare to prevent something.

1 Like

I agree. I was just posting an example but I also prefer v2 with the api-key. I also prefer the custom methods over the Odata for some reason. Also, I would institute some PowerShell to roll the API-Keys periodically.

Correct you do not want to use v1 there are user access risks once authenticated unless you have service and method security implemented to the 9s. If you are using menu security then v1 is a screen porch.

1 Like