E10 Printing

Given that E10 printing still crashes like it has since Vantage, what are some of you folks doing to mitigate the affects of a TaskAgent restart?

I’ve approached support, and basically been told “Suck it up and deal”.

I don’t think I agree with this statement… what are you running into? Printing just seems to work… fine…

2 Likes

I would second that. We don’t have issues printing either. It seems like pushing back at support to get a true resolution to the issue would be a good idea.

I’ll second third this.

And for all my issues, “crashing” doesn’t happen to us.

One thing to check on, is when IT does maintenance to the SQL server (if separate from the App server). Restarting the SQL server, requires restarting the task agents on the App server. If both are restarted, make sure SQL server is up before the Task Agents try to start.

Edit
One complaint I have that might look like a crash, is when running the client via remote, and trying to print locally. It takes several minutes for the list of printers to populate.

1 Like

I would agree with everyone above about general task agent stability. You can have issues with RDDs and/or processes that will definitely make it look like you have task agent stability issues. Do you just have a single agent running or multiples? Is the agent using the same application pool as your users? Have you turned up logging enough to look for commons events that are happening at the time it hangs? Also, run the PDT tool, if you have not already, and do a check config and make sure to resolve any issues it detects - just to insure the basics are covered.

1 Like

I vehemently disagree with that statement. I have had considerable issues in the past with task agent. As of current the Task agent stability is far more stable and resilient than ever before. Are there specific things causing issues for you, reports or tasks?

1 Like

the paint is still drying on your membership and already stirring up the pot…nice Chuck Norris entrance :laughing:

I think it’s nothing like Vantage, but it still hangs up more than it should, but I think that may be environmental. Certainly not everyone experiences this. Ours hangs up once or twice a month but we can’t tell why, so we just ‘deal’.

3 Likes

Hahahaha

Yeah, I’ve been administering Epicor servers for a long long time, and I’m super tired of the same old song and dance out of the support team.

In the E10.2.300 Admin guide there is a whole section on troubleshooting print, and their only solution appears to be to restart the task agent.

The problem with that is when you restart the TaskAgent everything that was running at the time dies. If those things that were running were tasks like Multi Company Direct or any of the Posting processes, you end up with garbage in your database, because the process will have failed.

I’ve been told in 10.2.300 they’ve added a checkbox at the system level to “Re-run cancelled tasks on TaskAgent restart” but that doesn’t really help from a mitigation perspective.

We have anecdotal evidence from a number of our sister companies who also have print issues with the whole print system just up and dying to this day, though I don’t know they’re all running 10.2.300

Are you running an app server for just print separate from your application?

2 Likes

Have you looked at articles like this:

We went the route of installing the taskagent service on another server and creating a second taskagent configuration on that server that points to the main appserver. And thankfully we haven’t had any issues where we’ve had to restart the Task Agents. We’re on 10.1.600.

1 Like

This belief is my fault. I misunderstood some internal engineering notes regarding requeue logic, and what is being discussed here regarding tasks being cancelled when a taskagent configuration is restarted is not any different in 10.2.300. I have since removed that statement from the related KB.

1 Like

Charles, welcome to the community.
First, I won’t contradict pain points of the past behaviors in doing batch processing and reports. The item you mention was been a sore point of mine. In geeky terms - hydrating and re-hydrating the state of a process when something is half done. The ability to save off progress, note when something went boom and then restart and pick up where it left off.
When the boom is a manual task agent recycle is even more painful. I’d like to see what is happening to cause that and I think the community in general up here have fought through some annoyances and can help.

To give you a bit of a peek behind the curtain, there are some things going on in current and in active development. I’ll comment on the current delivery and hint at roadmap with all the appropriate safe harbor alerts and no guarantees on delivery dates, yada yada that would fill a marketing slide :wink:

First, in current release - 10.2.300, we have put in place the beginnings of of restart functionality. It’s too coarse but will grow with time. We are keeping a ‘breadcrumb trail’ when doing processing. If we detect an issue with the SSRS service going out to lunch for example, we now note our status, restart the SSRS process and continue - silently. I will leave to the imagine the areas we invested in based upon our monitoring issues in SaaS. We host and admin the product ourselves and I’ll guarantee SaaS Ops and the general customer base are in like mind about stability issues and why you should see more stability now and into the future.

A few other scenarios have also been tackled. I’d like to tackle more but those will need to be more invasive changes to sacred processes. Destabilizing MRP for example would probably not make many friends. So things are being reviewed but no promises on more granular efforts.

One other area I need to get a blog post together upon is the annoying ‘printing / task agent has magically stopped’ and you hear it first from an angry user / customer. We are rolling out more health monitoring service and hope to lite those up in interesting ways but they are available to you now in 300.

First, the Task Agent is writing out explicit Error codes in the Windows Event Log. Most folks have monitoring of some type available to them as an Admin. Get a ‘102’ error code this and that filter then send me an email or text. Now when Task Agent dies, it should log all the details, try to recover ‘x’ times and if still dead, write out the entry for you to be alerted.

Second, monitoring 100+ servers or whatever we have now in SaaS has encouraged us to put easier health checking of the system in place. To that end, there is a new Ice.BO.HealthCheck service:

This gives you the ability to have a heart beat on different aspects of the app server, db, task agent and SSRS processes. I need to get some best practices together for publication so don’t run off and do anything crazy on your production boxes. We are kicking the tires on this in SaaS currently. The only reason I mention it is some people up here like to delta our code base and decompile things and send me questions (You know who you are out there).

That’s just current world. As we move forward into more of a cloud and hybrid world, there are interesting possibilities on handling the more complex tasks. What and when are we doing something? No clue to be 100% transparent. We are doing some interesting spikes to try out ideas, watch them blow up in research and take different paths, etc. Some successful approaches are being put on different delivery vehicles. I hope this is not a surprise as any decent company would be investing in how to improve things.

  • In summary, let’s figure out the current stability issues.
  • Current release has done some things different though more is wanted.
  • I owe the community a write up on some cool new toys around Task Agent and Reporting that sneaked out in 300. Sorry - not ready to broadcast externally until we see it in the wild in SaaS for a bit.
  • The future is definitely improving batch processing and reporting.

To be honest, a lot of that is driven by our own internal pain. From a manpower annoyance, bad customer experience and also bottom line cost of renting servers in Azure, they all align to make improvement that benefit cloud and on-premises users.You only need to look at the overhaul of MSFT as they went from Windows trying to eat the world and Azure trying to host the world and see how MSFT learned and learned to listen more.

Now… where do we start on what you are seeing?

7 Likes

The Task Agent is a much maligned piece of software and not all of it is unwarranted. The Task Agent at its simplest, is just a headless client that executes background tasks. If the background task fails for any reason, it is the Task Agent that reports the issue and is often blamed as a result – shoot the messenger syndrome. The Task Agent itself can also become a victim of the underlying process failure but all known instances of those failures have been addressed in the more recent versions of the Task Agent.

That said, we believe that there is more we can do in this area and Bart has outlined much of that.

One item for you all to look for is a correction that is in base 300 and included with 200.15 and coming soon to 100 and 10.1.600. The Update will provide more resilience to an AppServer or network “hiccup”.

5 Likes

Coming in on this one late …

A few months ago we were in the same boat as you, and printing was our biggest Epicor pain. It got to the point where I wrote a BPM which restarted the task agent for us on a combination of task queue build-up checks.

What we did may be overkill, but we moved in one jump from 10.1.400 on a single physical server, to 10.2.200 on a new server with VMs, one VM each for the database and two independent App Servers. I can’t be sure which of those changes was most helpful, but we’ve barely had a task agent issue since.

We are on an old release of E10 and had occasional issues with the task agent. What worked for us was putting a scheduled windows task to run daily in the early hours of the morning to stop and start the windows service - this works in our environment as we have a period through the night when there is nobody in who prints and we planned things like mrp/po suggestion generation etc around to not run around that period - takes less than 60 seconds. Others may not be so lucky to have this kind of window, but maybe going back to old style IT ops - agree with your business an appropriate time when services can’t run/printing won’t work and schedule an automatic restart of the service.

One thing that may or may not have some significance- I’m sure the folks who know the architecture way better than me will comment if it does indeed have an impact - standard advice is to stop any Epicor application pools not in use (test, pilot, train etc) to assist with performance - if you don’t disable the task agent for that instance, it is effectively constantly trying to connect to an Epicor application pool creating a lot of unnecessary noise and errors in windows. To me, from a basic windows server functionality perspective, creating a lot of unnecessary noise/errors could not unreasonably could lead to the service hanging/stopping

I would completely delete the Task Agent from any server that you have it running on. Make sure that you delete the C:\ProgramData\Epicor Software Corporation\ICE TaskAgent Service directory as well.

Then reinstall it from the \SupplementalInstalls\Task Agent folder from the version that you are currently running.

Last, if you can install the Task Agent on three servers then do it, if not, then at least two. This allows one Task Agent to stop processing tasks and the second agent will continue.

@Bart_Elia THANK YOU! THANK YOU! THANK YOU! :star_struck:

1 Like

Sorry for resurrecting an old thread, but I tried opening this up on our 10.2.200.19 install and get an error of “Sorry! Something went wrong. Please contact your system administrator.” - The problem is that I am the system administrator, and I have no idea how to fix it.

Is this a 10.2.300 thing only, or is there somewhere I can go to confirm that this is working?

Look in the windows event log. There is a post up here about ‘check the flight data recorder’ that can give advice.
There was a rethink on the return of exception details in 10.2.300. An ordinary user can’t do much with the stack trace and tech details so why confuse them with it. Also some scenarios could leak out details that some security folks objected to showing. Nothing fatal but between both requests the change was made to keep in simple and log it.

Aha!

Unable to find service Ice:BO:HealthCheck

I’ll have to open a ticket with support about this I guess.

When I check the entire api/v1 listing, Ice.BO.HealthCheckSvc isn’t listed, so it must not be available in 10.2.200