Epicor ERP 10.1 Task Agent Configuration Implementation Options

I’m not going to wade into why background processing can stop executing as that is a huge topic that is beyond what could be reasonably covered in one post–I’ll be discussing it throughout a number of sessions at Insights this year-- but I wanted to provide some context regarding taskagent configurations in 10.1.

Epicor ERP 10.1+ allows for up to three taskagent configurations per transactional database so that if taskagent1 stops for whatever reason, taskagent2 and taskagent3 can still process requests and if taskagent1 was processing a task when it stopped working, it will be functionally requeued and then processed by taskagent2 or taskagent3* (terms and conditions apply on the requeued part but the details are beyond the scope of this post).

So, let’s say that someone asked me how they could design a 10.1.400+ environment that would reduce the chances that their users would ever experience a printing outage (regardless of the reason why a taskagent could fail), we would discuss the following:

MONEY IS NO OBJECT BECAUSE DOWNTIME IS MORE EXPENSIVE IN THE LONGRUN DESIGN:

  1. Create three VMs; let’s call them vm-EpiTaskAgent1, vm-EpiTaskAgent2, vm-EpiTaskAgent3.
  2. Install three Epicor appserver processes on the three vm-EpiTaskAgentX servers (along with everything that is required to make that happen) all pointed to the same database on DB server vm-EpiSQLServer1.
  3. Install three taskagent services, one per vm-EpiTaskAgentX server.
  4. Create three taskagent configurations, with the taskagent configuration on vm-EpiTaskAgent1 pointed to the new appserver process on vm-EpiTaskAgent1. Repeat for vm-EpiTaskAgent2 and vm-EpiTaskAgent3.

This eliminates as many single points of failures as it practically possible on the background processing side–you would use the same SSRS instance throughout though, one could install a dedicated SSRS instance per vm-EpiTaskAgentX server and just have the reportserver database on the same SQL server instance on vm-EpiSQLServer to reduce the SSRS instance itself as a signle point of failure- but, it would require having custom reports imported in three different instances which would be a bit of work) and the physical host (but, if you have more than one host in your VM environment, that is reduced as well.

MONEY IS AN OBJECT DESIGN BUT YOU AREN’T A PURIST WHEN IT COMES TO INSTALLING SOMETHING NON-SQL RELATED ON A SQL SERVER:
NOTE: assuming at least two servers–one for Epicor ERP (vm-Epicor), another for SQL (vm-EpiSQLServer1). Also assuming that vm-Epicor already has one appserver process that is used for interactive user sessions and there is a taskagent server/configuration on this server pointed to the vm-Epicor appserver process.

  1. Install an Epicor appserver processes on vm-EpiSQLServer1 (along with everything that is required to make that happen) that points to the same database on DB server vm-EpiSQLServer1.
  2. Create a new taskagent configuration with the taskagent configuration on vm-EpiSQLServer1 pointed to the new appserver process on vm-EpiSQLServer1.

If the taskagent on vm-EpiSQLServer1 fails for whatever reason, the taskagent on vm-Epicor can still continue processing tasks.

MONEY IS AN OBJECT DESIGN AND YOU ARE A PURIST WHEN IT COMES TO INSTALLING SOMETHING NON-SQL RELATED ON A SQL SERVER:

  1. One could install just the taskagent service on a workstation and create a second taskagent configuration on that workstation that points to the main appserver on vm-Epicor.

This is better than just one taskagent configuration, but, there are still a number of single points of failure:

  1. the one appserver process. (though, if this one appserver process fails, users wouldn’t be able to log in so printing not working likely isn’t the greatest concern at this point).
  2. the Windows server itself where the appserver process is.

There is a topic within the TaskAgent Configuration help that discusses how to setup notification when a taskagent stops processing/throws an error–I know that people have used that same trigger to automatically restart the taskagent service* so that it is addressed without manual intervention.

*In 10.1.600, we have a new command line interface for the taskagent where just a taskagent configuration can be restarted instead of the entire service.

FINAL NOTE:
This discussion should not be construed as suggesting that the reason why a taskagent configuration stopped processing should be ignored because each instance should be investigated with the root cause isolated and a remediation process taken to prevent that condition from occurring in the future–that could be an actual bug in some process on the Epicor side that needs to be corrected, it could be XYZ custom report that has an issue and needs to be changed, could be an issue that is addressed with an update to MS SSRS, etc. Having an infrastructure layout where lots of individual things can fail while users can continue with their day-to-day tasks without interruption is a good thing from an operational standpoint as it allows those of us that need to dig into the root cause more time to do so outside the urgency that a complete outage would cause.

For those that are attending Insights this year, I will be at the Support table in Solutions Pavilion whenever I’m not in one of the sessions or labs I’m involved with if someone wants to have a more involved conversation on this topic. Worst case, one should be able to find me here :slight_smile:

For those looking for me, this is what I may look like

14 Likes

Awesome insight, thank you @aidacra

Epicor ERP 10.1+ allows for up to three taskagent configurations per transactional database.

Are you saying a maximum limit of 3 Task Agents Instances per DB, or just 3 different ways to configure it. Could someone have 10 Agents on 1 Transactional DB? (For me a Configuration is what you have Boldly hilighted in your post)

Maximum limit of three task agent configurations per Epicor transactional database.

2 Likes

Nathan, thanks for posting this info. We have had some issues with the Task Agent; I will certainly read through this before calling support.

Bryan DeRuvo

2 Likes

As an aside, I’ve been asked when I’ve discussed this topic at EUGs “why three? why not 39 or 10 or X?”

I’ll tell two potentially fictional versions of this story, and I’ll let you dear audience decide which one you prefer.

THE LORE OF THREE TASK AGENTS VERSION 1:
When this functionality was being designed, we wanted to make sure that we chose a useful and technically practical limit. We knew we wanted more than one which was the previous limit.

We then reviewed our largest customer’s infrastructure at 10.0–our two largest customers in terms of total user count were both Epicor; one was our MT Cloud offering (which for this conversation we are calling Epicor because in many ways, they function like one very large internal customer instead of many smaller external ones) and the other was our own corporate usage of it. It seemed that the Operational people determined that three background processing appservers seemed to be the sweet spot and that is what they were already using. It only made sense to allow for at least three because that is essentially what our two largest customers were trying to accomplish.

We completed some internal testing to see what the upper practical limit was based on our initial engineering of this (basically, push it until something breaks), we noticed that as the number of taskagent configurations increased some crazy impractically large number to some much crazier impractically larger number, locking/blocking could occur with many taskagent configurations all vying to see which one can stake claim to a particular request first because each taskagent configuration is independent from the others and has no awareness of others – each taskagent configuration just polls every X number of seconds based on the processing delay set on the system agent).

We could have spent a little time reengineering the process to allow for more than a much larger crazier number of taskagent configurations, but, the intent was to simply prevent an outage of background processing and not as a performance improving measure because having more taskagents will not improve background processing performance. After a full testing regimen at three taskagent configurations, we determined that for what we were trying to accomplish (to prevent the taskagent from being a single point of failure) having more than that wouldn’t be beneficial towards that goal.

As we are now three releases into this change (10.1.400, 10.1.500, and very soon 10.1.600) this design choice was validated based on actual experience that three taskagents for even our largest customers is not a limitation in practice. If three is enough for thousands of concurrent users we feel it should meet the needs for those with fewer. As I mentioned in the original post, this change was introduced as a way of preventing the taskagent configuration itself from being a single point of failure if something unforeseen happens not as a way of ignoring the problem that caused the outage itself.

THE LORE OF THREE TASK AGENTS VERSION 2:
Three is more than one and less than five, and due to an innate cognitive aesthetic preference for odd numbers, we settled on three instead of four.

6 Likes

Version 1 was tl;dr. Thus making Version 2 the preferred one. :wink:

2 Likes

Truth is a three edged sword, so I don’t believe either story. :wink:

Seriously, good information Nate. I passed it up to my systems guys.

1 Like

3 Likes

Hi Nathan,

Could you provide us some information about a solid concurrency setup so that we do not Max out on the Concurrent Tasks of an Agent. Also is there a way to dedicate a special Task Agent for just Multi-Company processes, IC Processes, Global Scheduling etc…?

Also if you have 3 Task Agents configured and one reaches its concurrent limit; will the other one pick up the other pending tasks?

Q1: Could you provide us some information about a solid concurrency setup so that we do not Max out on the Concurrent Tasks of an Agent
A1: Multiple taskagent configurations won’t necessarily reduce hitting the max number of concurrent tasks on any one taskagent configuration. If that is a common experience in your environment, double (super precise I know) the number of max concurrent tasks for all of the taskagent configurations.

Q2: Also is there a way to dedicate a special Task Agent for just Multi-Company processes, IC Processes, Global Scheduling etc…?
A2: Taskagents don’t do much beyond polling the appserver every X number of seconds for tasks that need to be processed–they don’t do processing itself. However, there is a way to dedicate one appserver process that handles background processing to only handle MC or Global Scheduling through the System Agent > Task Agent rules.

Q3: Also if you have 3 Task Agents configured and one reaches its concurrent limit; will the other one pick up the other pending tasks?
A3: The taskagents aren’t aware of each other beyond knowing if another taskagent has staked claim to a particular request. If taskagent1 hit the max number of concurrent tasks it wouldn’t stop staking claim to more tasks if the other taskagent configurations didn’t stake claim to them first. NOTE: When a taskagent has hit the max concurrent tasks, the task in the system agent will have a PENDING status in the Active tasks. One a task is completed, it will change the status of the PENDING status task to ACTIVE.

My next post will be about task agent rules and how they fit in with multiple taskagent configurations.

3 Likes

We now have three AppServers and Task Agents but have a few questions about managing them in Production.

  1. With running multiple task agents on multiple servers – how do we know if an application pool or task agent is stuck/hung/not working, etc. besides monitoring the sysRptLst table and looking at the Host

  2. Any logs, events, or DB entries we can look at

  3. Are there any performance tools to determine the “best” performing server with application pool and task agent?

  4. From what I have read, we are limited to (3) task agents per DB, Are we also limited to (3) application pools or can we setup a fourth application pool for user/client activity and have the other 3 dedicated to task agent

  5. Are BPM’s compiled to the App Server and if so, how do you sync them between App Servers?

John

  1. In the Task Agent Service Configuration help, there is a topic on how to monitor for specific error numbers that the task agent event log may have to indicate that there is a problem.
  2. Epicor ICE Task Agent log on every task agent log is the “log of record” for taskagent related activities.
  3. As in, of the three, which of the three is processing the most requests?
  4. No limit on application server (application pools) per database.
  5. It’s all automatic as of 10.1.400±-BPMs automatically work across all appservers by default. There isn’t anything that one needs to do to keep them in sync (unless they reference external assemblies, then you’d need to sync the external assemblies or reference them in a way that wouldn’t require them to be on every server).

Thanks so much Nathan

1.The biggest issues we have are with errors that don’t create any Event Log entry. This is always caused by something outside of Epicor like Windows permissions or mismatched DLL’s. I saw an entry on here for an email alert if a task fails, which still requires the agent to finish the process and not just hang.
3. Specifically which one has the best performance if we want to use Task Agent Rules to run specific reports on the best performing VM.
5. Would there be anything else that needs synced once you move to multiple app servers?

Thanks again,

Hi Nathan @aidacra

We had a couple questions regarding your recommended task agent configuration mentioned above, and hoping someone might be able to help. We are currently in the process of testing 10.2.300.3 for GoLive in February and we just installed the environment on all new hardware.
We setup 3 application servers with 1 task agent on each of these. One of the things we noticed initially is that the log files for processes would go to 3 different application servers (this has since been resolved). So it prompted us to ask the questions:
1). How is it determined, when a user logs in, which application server they are connecting to. Provided that we have not changed any of the default settings for this in the sysconfig file.
2). Additionally, how is it determined which app server is the primary handling the load? And which are the back ups? I’m assuming there can still only be 1 task server running at one time (like stated above for 10.1)?

We are just trying to get a handle on this before GoLive and better understand how things are being handled, especially for future troubleshooting.

Any additional information would be very helpful.

Thank you,
Matt