E10 Memory Issues

Currently we are getting a memory exception killing all incoming requests and users can’t do anything in E10. With looking at the server specs we have 63.9 GB available and SQL server has a cap at 40GB for max server memory usage. Unfortunately the app server holds SQL as well until we setup our load balance instance. But with 63 GB available and SQL capped at 40GB and reviewing the processes we have SQL using 33 GB of memory and IIS using 18 GB. With both processes its cutting it close to where memory usage is 92 %. Is there anything I could modify to help out with this issue?

WebHost failed to process a request.
Sender Information: System.ServiceModel.ServiceHostingEnvironment+HostingManager/60375305
Exception: System.ServiceModel.ServiceActivationException: The service ‘/WPLiveDB/Ice/Lib/ClientCache.svc’ cannot be activated due to an exception during compilation. The exception message is: Memory gates checking failed because the free memory (2836705280 bytes) is less than 5% of total memory. As a result, the service will not be available for incoming requests. To resolve this, either reduce the load on the machine or adjust the value of minFreeMemoryPercentageToActivateService on the serviceHostingEnvironment config element… —> System.InsufficientMemoryException: Memory gates checking failed because the free memory (2836705280 bytes) is less than 5% of total memory. As a result, the service will not be available for incoming requests. To resolve this, either reduce the load on the machine or adjust the value of minFreeMemoryPercentageToActivateService on the serviceHostingEnvironment config element.

That error is from a WCF throttle to avoid taking new server calls when your box is ‘low on memory’. You can set the throttle to what percent you want with 0 being no throttle:

 <system.serviceModel> 
     <serviceHostingEnvironment minFreeMemoryPercentageToActivateService="0" />
 </system.serviceModel> 

The intent was valid - to not have a deluge of calls overwhelming the server so it cannot respond and everything is stuck paging to disk constantly. Now that we normally have servers near or above triple digits of RAM, 5% is a huge chunk of memory :slight_smile:

Just set it to 1 or 0 and you will be happier.

4 Likes

What is the difference between 1 or 0? We try 0 and server still seems to hit hi points of IIS memory usage.

This is a WCF throttle setting from MS for IIS Admins to control how they want their servers to behave. A 0 is ‘disable throttling’. Anything else is a percent of free memory on the box to reserve.
e.g. - you have 100Gigs of Ram. A setting of 1 would mean the server stops accepting calls when its free ram drop to 1% of 100G so 1G ram free.

The idea is for a server to not fall over into endlessly paging to disk when overwhelmed as in a Denial of Service attack.

The default is 5% of RAM set aside but that was when we were not all running around with terabytes of RAM in our phones let alone servers. I see 1% commonly.

Hi Tyler,

Its important to know what your server architecture is. From your question I understand you have a single server configuration (App (Web) Server + SQL Server). Is that right?

Do users connect to the server via citrix or rdp to run the client or each user has its own local Epicor client installed?

If you open the task manager on the server, or the performance monitor, who is consuming most of the cpu/memory?

There are 4 programs you have to look for if you think issue is related to Epicor related processes:
For the first 2 it depends if users run the clients directly from the server.

  • Epicor.exe (Epicor x86 client)
  • Epicor64.exe (Epicor x64 client)
  • w3wp.exe (IIS Worker Process)
  • sqlservr.exe (SQL Server - Database Engine)

Check for CPU and Memory usage.
The sum of all those will probably tell you if your 63.9 GB of RAM available is enough or not based on your current load.

Hope that helps.

Carlos Q.
PSE

Whenever we have memory leak problems we open up the client in memory test mode. Have you tried this yet? Just copy one of your Epicor client icons and make target look like this:
C:\Epicor\ERP10.0Client\Client\Epicor.exe /config=Epicor10.sysconfig /memory

This will open up the mem tester box. Usually helps me find leaks in screens and what not.

2 Likes

A note on memory consumption. I don’t see what version you are running but two memory leaks have been fixed on the server in the past year or so. That could bring your w3wp down to more reasonable ‘idle’ levels. We are seeing our SaaS servers running at 5 to 8 gigs with these in place but of course your mileage will vary.

  • An EF bug we discovered and patched around (198692). This was delivered in 400.28, 500.14 and in all 600
  • A 600 caching bug (201928) fixed in 600.14

If you are not up to those patch levels, you should know the process by now…

3 Likes

@Bart_Elia, I’m running E10.1.600.11…would the 600.14 caching bug fix help to fix my problem of the sqlservr.exe process eating up most of my server memory and not releasing it? Or do I need to look into something like some SQL Maintenance Jobs to keep that at bay?

Just wondering what your thoughts were on that. It seems like after some time (days) the sqlservr.exe process grabs up most of my server memory and will not let go until I reboot my server. Trying to figure out why and how to prevent this from happening cause that’s when we start getting “The requested service could not be activated” - WCF throttle limit errors. I do have my SQL Max memory set to a value that gives some space for the OS and related processes to run but the sqlservr.exe process seems to eventually take up more memory than that Max value is set to.

Thanks for any help and advice you can provide! :slight_smile:

Limit the amount of memory SQL can use.

2 Likes

To spell out the process that Bart alluded to:

  • The EF bug: one must upgrade to a point update where it is included in the 10.1.400/10.1.500 releases. For 10.0.700.x customers, we have a one-off available at 10.0.700.2 and 10.0.700.4 only. If someone asks for it at 10.0.700.1 or 10.0.700.3, well <insert_sad_trombone>. It is included in all versions of 10.1.600.x.
  • The 600 caching bug, contact Support for the workaround until 10.1.600.14 is released. It has a very distinctive behavior that is very straightforward for Support to identity
2 Likes

@tkoch nailed the response for sqlserver.exe :slight_smile:

SQL Server loves to cache everything is touches. If you don’t put a limit on sql it will consume everything on the box. That’s SQL Admin 201.

The issues I mentioned had to do with the memory consumed in the app server (w3wp.exe). The app server was setting aside some memory and not handing it back in Epicor code. THAT we fixed.

2 Likes

Of course that is the first thing that I tried was setting the SQL Max Server Memory and resetting the services to accept that value. I THOUGHT I was observing that my sqlservr.exe process was taking more memory than that value set such that it was causing “The requested service could not be activated” errors. But maybe it was the sqlservr.exe process along with the w3wp.exe process??? I don’t have evidence of this but will get that if/when this happens again.

I was just checking to find out more details on what the 600 caching bug fix actually fixed to see if it is related.

I guess my real question is, if I see that the sqlservr.exe process grabs the Max Server Memory that it is set to allow, do I just let it hang on to it indefinitely? Will it release any and throttle back down to a lesser amount on it’s own or will it just continue to hold onto it?

Thanks guys! :slight_smile:

Geek moment on the memory leaks, ignore this is you don’t want to look at C# / dotNet innards…

These kinds of issues pop up more than I like in my schedule but are ‘entertaining’.
The first one (201928) is a simple cache issue. We have a reasonable caching infrastructure for ‘table caches’. These take the entirety of a db table and place it in memory for slow changing, constantly queried tables - Ice.UserFile for example. The base classes on these table caches handle cross app server notifications too so when you update a user on one app server, the other app server gets a ping to refresh the cache immediately (No more 5 minute wait on UserFile changes - anyone miss that dialog?). These caches consist logically as a dictionary with the value being a db row (Ice.Table.SysUserCompRow) and a key of whatever identifier - usually the pk(s) of the row.
Well, one of the caches used a key that was a class with the pks as properties. The comparison by object was done incorrectly so the key was never found in the collection and a new row queried from the db and added to the dictionary. We ended up with a gazillion small records in the cache all duplicated :confused: Fixed the comparison and now no cache misses, no growing memory.

The other issue (198692) was a beauty that took going to Diego and Arthur on the EF team to track down.
It’s not too big a secret we worked with Microsoft on creating the ‘autocompiler’ for Entity Framework back in the EF1 days. Compiled queries in LINQ make db queries super fast. (Something likes 1000 times faster query perf was measured back in the POC days of ICE 3). But they are a pain to write for the normal dev. It would be ideal if the framework could intercept the query and ‘auto compile’ the query and cache that.
e.g. - think ‘var query = from Ice.SysUserFile row where row.UserID = @Foo’ in linq. The @Foo changes from query to query but the rest is the same every time so cache that structure of the query.

The next time the app server sees that structure fly by, grab the cached version, plug in the parameter values and voila, 1000x faster. We worked with the EF team to wrap the EF classes and intercept / inject, build upon the EF internals accordingly. Life is good.

Then MS goes off and builds it into the next EF and ‘improves it’ - makes it more general purpose for everyone - win!

In their doing this, they started putting limits on the compiledquery cache collection in the EF framework. Before it was a manually managed dictionary. Then they went off and started trying to auto manage the collection and limit it to … 800 queries. (That’s about how many queries it takes to stand up the Menu and Sales Order.)

So we were getting their cleanup mechanism running around killing queries off as fast as we added them. Their clean up process would kick in and they would start tracking everything, every minute a sweep would go through every query to verify if it had been used in the last minute, we would re-add to it and track that on our side as well for result caches (another topic for another day)… EF was just not tuned to run at our volume of queries in a single process. And the tuning dials were private constants in EF that we did not have access to change. A little surgery and reflection later, values injected to sane levels and voila, no runaway cache misses and sweeps and duplicate caches entries on missing values. That was a crazy bug to find and the EF team was great to deal with (again) in tracking that down.

So if you ever think we don’t appreciate your upgrade pains…

5 Likes

Loving the knowledge here! Keep up the great work guys, and thanks very much!!

I am still having issues here…