We completed an upgrade from 10.2.600.36 to 2022.2.11 on prem at the end of August, and ever since then have been trying to chase down performance issues with MRP. We went from being able to run Regen on 4 different companies each weekday to being forced to run Netchange. On the Weekends where we run 5 Regens that were not completing in over 24 hours and having to kill MRP and run net change (one company has not had a regen since we upgraded).
I was finally able to get a test to run in 7 1/2 hours (all previous tests ranged from 14-48 hours) by truncating systaskparam (over 20mill records), deleting all records from systask, changing the customization disabled setting in the Host config to true, and disabling server logging. (As a note, just deleting systaskparam and systask, and disabling the logging got us down to 14, disabling customizations dropped to 7.5)
This points us to something custom, so I have been testing with customizations turned back on, but disabling each Data directive and method directive manually.
Our testing is done by running MRP Regen for 10 minutes at level 0, and counting the number of records in the log file. With Customizations disabled via the Host file, we get around 30k-40k rows in 10 minutes. Disabling BPM’s only with the flag on the directive has ranged from 16k-20k records in 10 minutes.
This result is telling me there must be something else that the host file disables that is getting triggered by MRP, but I am at a lose to what it might be.
Also as a note, I have looked at the query store, and there do not seem to be any missing indexes or bad execution plans that I can see.
Has anyone else experienced something like this when moving from 10 to 11?
Any idea’s on what I should be looking for that the host file disables besides Method or Data Directives?
How much processing power do you have available? I was having issue with MRP stalling and found that I could power thru it with 50 processes and 99 schedulers.
I assume you have read the logs ad nauseam, but have you disabled process MRP on anything not active in PartDtl?
Do you have write entries on the start and end of every bpm that could be involved. That can give you a simplistic stop watch on them.
You can also see the duration of each bpm in the serverlog.
You can also use PDT to give you can idea of long running bpms in the MRP time frame.
Do you recycle your Application pool ever? We recently upgraded from 10.2.700 to 2023.1.10 and noticed our MRP process growing significantly night to night. Luckily I have a nice bar-graph of it so it is very easy to see the growth. Anyway, I noticed after installing windows updates over a weekend that required a reboot that the MRP process reverted back to normal length. After investigating, i noticed that just recycling the App Pool would reset the length of time back to normal. 2 weeks ago i added a timed recycle of the app pool to occur an hour before MRP was set to run. Since then, MRP has consistently run at the better performance.
We have enough processing power to run at around 7 processors and 7 schedulers. Anything more than that and we start to see the CPU near or hit 100% with MRP running.
Without adding a BPM to partdtl to trigger when Process MRP should get checked and unchecked we have not done this.
Currently we have been seeing issues even with BPM’s disabled via Host Config, though as I mentioned, I was having issues when manually setting each bpm to enabled=false.
We do recycle the app pools somewhat regularly. I am going to do a test with this in our test environment to see if I can track improvements doing this.
I would make absolutely sure you don’t have a bpm problem. This technique is good for ensuring run with all of the extra bells and whistles turned off. Bypass BPM for toubleshooting?
I have already been testing this way with them all disabled in the Host.config file. There was one instance that this ran perfectly and MRP with MLP finished in 7.5 hours (in E10 we only got it to finish in 13.5 hours)
My last tests earlier this week with them disabled this way however yielded around 3k rows in the log file in 10 minutes (the 7.5 hour run has around 39k rows in 10 minutes)
We have been trying to get it back to 7.5 hours so we can test disabling BPM’s in batches to try and determine the source, but it seems like BPM’s are not the only thing impacting the performance.