Recently upgraded from 803 to 101600, and our appserver keeps freezing/running very slow. We noticed that the ‘iis worker process’ is consuming almost 100% of the CPU. The appserver has 48gb of RAM. It gets so bad we have to restart our appserver, we are fine for a couple of hours and then we are right back to almost all of the CPU being used. We noticed this error in the event viewer. Does anyone have any suggestions as to the cause?
smells like a runaway db query trying to download the contents of the internet to your app server. What’s ram doing during this event and what’s the sql server doing?
SQL server is fine. RAM available to SQL is capped and the SQL box doesn’t slow down nor does anything in task manager look out of the ordinary. RAM on the appserver during the high cpu usage is around 40-50% usage.
Is it growing? e.g. - the app server has a query that continues to suck data from the db and you see a slowly growing w3wp size?
What scheduled tasks do you have in your system agent? We had an old BAQ Export that eating up resources until it was rebuilt for E10 series.
Does this occur even when Epicor is left alone without any use? What locks, if any, are happening on the database tables or indexes? Have you done any server level logging?
I ran into this one time where an Epicor method was expecting certain results in the middle of a transaction. Because it didn’t get those results it was expecting it just halted the transaction. Since we had our timeouts set to 2 hours or longer Epicor didn’t recycle the memory and it continued to build until we recycled the application pool. What we saw on the user side was it would either crash Epicor (Windows level crash), or the client would just halt and go into a Not Repsonding state.
How we eventually found out what was going on was by doing a server level trace along with database lock investigation. If you don’t have the client hanging on anyone’s machine, or you’re not seeing Epicor crashing it would be more difficult to determine where this is occurring. I would probably start with each group of the company. Allow only them to use Epicor for a while and watch the CPU of the worker process. It should eventually get back down to nothing or near nothing. Then bring more groups on and continue to watch it. When you’ve pinpointed the group, have them only perform certain transactions, until you’re able to find out what process in particular is causing this.
Another way to do it would be to put each team on their own application pool and see which one builds.
We ‘might’ have found out what the issue was. Turns out the PO Suggs process was hung up. We are continuing to monitor to see if it is fixed.
Double check DB indexes. Do you have BPMs?