System Performance

Does anyone have any thoughts on this because quite frankly its making me sick!

We’ve always had performance issues. From E9 and though into E10. Different problems and different times. Some we’ve caused ourselves and others seeming unexplainable. These are the ones that are causing me the biggest headaches. On top of that the user perception of the system is that its too slow as standard. For example the perormance guide says it should take 3 seconds to add a sales order line. It does but every new user thats come from another system find it mad that this is normal. They expect close to instant results and have had them in other systems (they say)

But yes the main problem is the random slow downs that have no explanation. Primarily these affect sales order entry and customer shiment entry. These will suddenly go from working at acceptable levels to vastly unacceptable levels with pauses, system not responding time outs and and errors

Lets take last night as an example. 6:30pm on a Friday. Everyone has clocked off an not only that but the weeks before Christmas lots of people are using their holiday time up. 4 people on the system. 2 despatch operators in UK and 2 people in Mexico. UK team report that customer shipment entry had gone slow for 30 minutes. Normally a 1 line order which they would process in 30 seconds (including Freighting in Manifest/QuickShip via Fedex and printing the despatch note) is now taking 7 minutes. SQL system monitor shows 2% CPU usage, 0.8mb/sec disk usage, 150 batch ops per second. No blocks, locks or waits. No SSRS reports running. Nothing like COS/WIP running. What I do to try and fix it which “seemingly” works about 60% of the time is to run UPDATE STATISTICS on the ship, order and invoice tables.

Our database is aroud 650gb. Ship, Order and invoice tables are pushing 5m rows with TranGLC and parttran much bigger. In September we had an SQL consultant spend 2 weeks taking diagnostics of our server. Nothing particularly untoward happening and our system spends most of its time ticking over. Peaks do not correlate with slowdowns. Some higher than preferred latency was noted on Tempdb drive. 3 weeks ago we invested a lot of money in top end servers. Lots of RAM and most importantly PCI based NVMe SSD storage. Perfectly configured. The IO stats are amazing. A database restore in 7 minutes opposed to 3 hours previously. SQL operations are lightening quick. Very low latency times reported. Also using 10gbE bwtween App server and SQL server. The network performance tests have improved by over a second on average. Yet Epicor is showing no real improvement whatsowever. A little snappier in the UI and fulfillment workbench is a but quicker but overall no one noticed and the randowm slow downs continue as often as ever. Similar types of issue with Sales Order entry. Purchase order entry has an intermittent issue where it locks loading a PO, never loads and goes on to bring the system down. Sending invoices using email routing will kill the system. Our UK batches which are about 200 invoices will kill every part of Epicor for around 45 minutes a time. Our European invoice batches have the same effect but are aroud 30 invoices at a time so work through quicker but with the same devastation. We’ve lived with this state for about 3/4 years now but we are getting much busier and we cant struggle on any more.

Im at a loss as to what to do next. We have some consultancy time booked with Epicor but I dont know what we will discover. Our gut feeling is that there is some kind of code issue within Epicor itself that is struggling with our data set. Its odd why reduilding statistics seems to help and why we have to run it on a 30 minute/hourly basis

Does anyone have any ideas? We are on 10.2.300.12 but have seen the same behaviour through Epicor versions, SQL versions and hardware changes.

1 Like

First off, I’m no expert on performance issues. But I’ll throw out some things to check and/or get you thinking…

Is this just during the invoice entry time, or while posting? I’m wondering if it is more of an issue when lots of data resides in the client, compared to when server side processing is happening.

Are all users running the client locally? And over a VPN? Have you tried running the client as a Remote desktop App? Specifically from the App server (or a RD server in the same location, or on the same subnet, as the App server).

What does enabling tracing show while one of those 200 invoice groups is being created? Is a particular BO taking exceptionally long? Or are all BO’s taking longer?

@richard_cj_gardner I assume you have a baseline of execution time from PDT when things are good to compare to when they are bad. This also will show all of the BOs and bpms that are in play. Are any of them drastically worse during the slow periods?

We struggled in 9 with performance and so far have been blessed without issues, so I am not running PDT all day everyday.

The one big issue I did have was MS and .NET related to having diagnostics written to the event log. I had an errant comma in a log.WriteEntry that was making sub event logs and made the registry massive. When I was trying to fix this issue I wound up making Epicor revert to using the Application log rather the Epicor App Server log. When that happened everything I was logging was 5X slower.

Is your SSRS using the same SQL server and DB as the main app? We are both separate server and replicated db to avoid any reporting issues.

Do you have SSRS and background tasks on its own app server?

Have you experimented with putting shipping or accounting on their own server or at least their own app server? This could spare the other users if invoicing is slow or at least point to SQL rather than appserver as the issue if they are all slow.

Have you checked the IIS Worker Process memory usage? I have read of issues where it balloons to a massive size and has horrible performance.

I assume you have the sp_lock3 script from Epicor to show locked tables? I ran it every five minutes when I was having issues.

1 Like

What do you mean by kill the system? The performance degrades for specific business objects or it actually causes the system to hang or crash and forces a system reboot?

Does this happen for all instances on your server or only the production instance?

Are you rebuilding SQL indexes, or just updating statistics?

Have you ruled out Customizations and BPMs as potential culprits? Inefficient BPMs especially are a very common cause of poor performance. There is a lot of info on this site about BPM “best practices”, as well as common traps to avoid - e.g. joining a tt temporary table to a database table.

2 Likes

Hi, it is weird that rebuilding statistics that often fixes problems but it might be also part of the problem. But also how and what are you rebuilding and why do you do it? Normally this is done outside of hours. Depending on how and what will cause the statistics being not optimal. You mentioned an upgrade , is this a carry over from this. Again, as someone else has mentioned, customizations etc. Also, you say everything is perfectly configured, but does that mean everything? E.g. bios settings etc .

The issue appears to only be when invoices are being emailed through APR. If we generate and post an invoice batch without using APR there are no problems at all. What we will see then is a bunch of long running (5 minute) high cost queries being run against the MASTER database. This will make SQL hit the roof on CPU, Disk Access and memory useage. Because of the high useage you see the impact across the board inside and outside of Epicor. As the invoices clear through it all drops back to normal

All users have local clients. Many people are working from home currently so are using VPN but the issue predates COVID and affects users in the building and on the same LAN as the server equally. The warehouse users have been tested using RDP and reported no difference

I should note that we in IT have never directly expereinced these “random” non invoicing related slow downs. We can never be in the right place at the right time. I have been tempted to dismiss it as a lie or exaggeration or a local network PC/Issue except that other users have independently reported the issues. So we dont have great PDT analysis but when we have seen some issues we see a large slowdown on the Update and Freight BO. The random slow downs generally tend to only affect that department. If warehouse is going slow, sales order entry is fine etc

SSRS is on a different server. In fact we have two SSRS servers - one for Epicor standard reporting and one for our own.

We havent tried seperate app servers so will consider this

IIS process is a little high maybe but doesnt strike me as excessive. I’ll admit i’ve not looked at it out of hours when the slow downs are really hard to explain. Will try that

Yes to sp_lock3. No locks

When it kills the system we get numerous users across departments report that the system becomes very sluggish to command, screen will go white with not responding error and often “A system error has occured” messages. You wont be able to print anything during this time

Not checked test system. Will do

Posting invoices and APR aren’t really connected, unless you have a BPM doing it.

Do you mean printing the invoices (selecting Group → Print Invoices, then selecting a style with APR, and enabling the routing), from inside Invoice Entry? Or do you have a BPM that prints the invoices as a result of the posting process - and this BPM uses an auto print that specifies the APForm style and parameters?

Take a look at the APForm being used by the Style with APR. Both the RDD and the RDL.

@TomAlexander indexes get rebuilt weekly. Statistics is what we do pretty much hourly

Not totally ruled out customisations bt we dont have that many. On Customer shipment the only one was written by Epicor so I would hope it uses best practice

@enbw agree its bad but on the whole it seems to work and often the system wont recover until you do. We have an hourly schedule to do it but often have to run it more often. At the beginning I had a theory that new invoices was the cause but disproved it by holding invoicing for a few days.

No not upgrade related. Has persisted through different SQL versions, Epicor versions and hardware

Cant guarantee everything is perfect but have poured over all of the Epicor and SQL best practices for power settings, RAID, Hyper-V settings, Disk block sizes, SQL maintenance etc

Hi, so the issues(major) only occur when generating the invoices? or when printing the invoices. all other activities, may be slow but don’t cause a system down. Again, please clarify a system down. Also, what service are you using to send the emails?

Hi, Should not really do statistics that often. That will lead to issues.

@ckrusen sorry yes as you descibe for printing and routing. The BPM for APR is on invoice post. If we are posting and the customer has certain fields set then email the invoice via APR

Will have a look at the RDD and RDL again

Hi, Have been there, but taking over processes from previous versions is not always best practice.

Hi, so in summary, it is all around the invoice process? And sending/emailing invoices?

@enbw no Invoices slow downs are just one of our issues and is a seperate and fairly compartmentalised issue on its own. Now we’ve sqitched APR off, thats gone away but we want APR back

Statistics helps for random slowdowns. Where customer services say that sales order entry freeses for 30 seconds after each line, freezes after entering the customer code or when the warehouse team say customer shipment entry has slowed from being a few seconds per action to a few minutes per action

I totally agree about updating statistics being bad but its only a reaction to the initial problem. If I stopped doing this this week the system would go slow around 8am tomorrow morning and stay slow for the rest of the week until I rebuild statistics or indexes. I want to stop doing it but its the drug that keeps my business operational

1 Like

Hi, who advised to do the statistics update?

Hi, if as above, we are doing the same sort of loading but on worse hardware but are “surviving”. With one appserver, sql on a serperate vm. We do see issues when we run invoicing during heavy periods but nothing show stopping