2023.1.X Update Post Mortem and PSA Regarding Performance (and how to fix it)

We’re currently running debugger on TaskAgent to see if we can catch the problem in the act. With some additional logging @josecgomez has narrowed down the issue is on the client side. We’re hoping it’s not a race issue that running debugger will mess with and not exhibit the issue, but that’s a clue too I suppose.

Additional logging showed the IIS server respond to the client with 200 response, but the client never showed a response and hung. Other loops in the task agent continued to run so the service didn’t lock up, something went into exception drop or infinite loop.

I wasn’t hiding, I was asleep. :innocent:

We’ve been testing in a few SaaS customers, and I haven’t heard of any issues so far.

That being said it sounds like there is an issue. I haven’t been able to reproduce anything locally that I could debug. It’s always difficult when you can’t see the problem yourself and have to rely on the kindness (and logs) of strangers.

The bit of information from @jgiese.wci about Fiddler has me thinking of issues I’ve seen in the past.

I’ve been working with @josecgomez and things seem to be pointing back to something very low level.

I’ve got a couple ideas I can try, but without being able to reproduce on my machine, it is a bit slow going.

Rest assured; we are working on this. :construction_worker_man:

5 Likes

Currently, we only support GZip, not deflate. No reason we couldn’t add it, just figured we should always use GZip.

Not sure going down this path will help anything. At this point it is low on my list.

Was really just a debugging idea, to see if the same lockup would happen with deflate…

But as long as you are compiling a list, why not also add Brotli? :smiley:

You know, right, that smart client is not supposed to be used in the future?

2 Likes

We are looking at Brotli! This is not bundled in IIS so extra downloads, etc. would be needed.

Brotli is supported in browsers now, too… :wink:

Also, define “future”… I guarantee you, 10 years from now, I will still have customers on 10.2.500… Hell we still have 5-6 customers on 905 with a Progress backend… lol

I doubt Epicor will port it backward to WCF :wink:

as for browser, IIS already support it naturally

and client side will be handled by browsers, like it now handles json compression.

1 Like

TaskAgent is not the smart client but exhibits the same issue. Along with DMT.

We’re starting to suspect my clients lock up because I’m on prem, and Jose’s do not because he is cloud and that’s just enough latency.

My DMT and TaskAgent also lock up on compression, my outlier compared to @josecgomez was my clients.

That theory also helps explain why once we introduce logging and/or Fiddler we can’t get it to happen.

Are they configured to use SSL connections or are they still using NET.tcp? Just curious. :thinking:

There was a pretty good network performance improvement in 11.2.200.18. If you aren’t on that release or higher, you will probably want to move. It cuts down the number of round trips needed to make a call.

Is anybody here having this issue running an earlier version than 11.2.200.18?

1 Like

Task Agent uses the same networking code as the WinForms client. NET.TCP is no longer an option.

All SSL

1 Like

you are right. I don’t know future plans for DMT, but TA definitely does not need any UI.

Latency suspiction is understandable but afaik TA does not get huge responses from server, so compression/no compression sould not affect much in its case from latency stand point…
If it were async deadlock, then CPU would be low…
But I am sure Jeff will figure it out :smiling_imp:

I highly suspect it’s due to a specific .NET version incompatibility (or glitch) with .NET Core… The HttpClient implementation in Core is completely different, and I know of other cases where it caused issues. Probable 2023 is not having the issue because .NET 4.8 should be pretty much at parity with .NET 6. But 4.7.2 might not be… Not sure when the version on the client side was changed…

I also have it setup in a 2023.1 migration environment on premise, no issue was reported as of yet…

Theoretically, server is completely independent from client. I can have server on linux and php and yet still be able to create .NET client application.
Of course in this case our custom serialization is involved, but in this case I think it would fail on deserialization, not with compression.

4.7.2 was may years ago. Since we moved from WCF, client was already on 4.8 for a long time.

Here’s one example… I agree, it’s old, but sometimes a specific combination can cause unexpected issues…

I thought so too Hugo but after some research

Compression is done in IIS outside .net and it’s independent of net version

The compression happens after dot net appserver returns the payload in the iis engine

The compression being an issue is not about compression itself, it seems it’s just highlighting a race condition or timing issue because it made things much faster.

Hence why when we introduced latency via fiddler or debugger problem disappears

2 Likes

Good news, it looks like we’ve found a solution. Still testing, but everything looks good. I was able to run a test application that pummeled the application server all weekend and had no issues. Usually, the test app would break in under 10 minutes.

This looks like an issue with the automatic decompression that .NET FW 4.8 is doing on the client. I turned off the automatic decompression and added code to do the decompression ourselves. The current guess is that it is a threading issue, which moving the decompression to a later point solves.

I will be adding this code to 2024.1 (11.2.500) as it is our current development branch. We are readying 2023.2 (11.2.400) for release soon, so I will probably add the code back there as well.

8 Likes

Solid work @JeffLeBert - thank you!

1 Like