We’re currently running debugger on TaskAgent to see if we can catch the problem in the act. With some additional logging @josecgomez has narrowed down the issue is on the client side. We’re hoping it’s not a race issue that running debugger will mess with and not exhibit the issue, but that’s a clue too I suppose.
Additional logging showed the IIS server respond to the client with 200 response, but the client never showed a response and hung. Other loops in the task agent continued to run so the service didn’t lock up, something went into exception drop or infinite loop.
We’ve been testing in a few SaaS customers, and I haven’t heard of any issues so far.
That being said it sounds like there is an issue. I haven’t been able to reproduce anything locally that I could debug. It’s always difficult when you can’t see the problem yourself and have to rely on the kindness (and logs) of strangers.
The bit of information from @jgiese.wci about Fiddler has me thinking of issues I’ve seen in the past.
I’ve been working with @josecgomez and things seem to be pointing back to something very low level.
I’ve got a couple ideas I can try, but without being able to reproduce on my machine, it is a bit slow going.
Also, define “future”… I guarantee you, 10 years from now, I will still have customers on 10.2.500… Hell we still have 5-6 customers on 905 with a Progress backend… lol
There was a pretty good network performance improvement in 11.2.200.18. If you aren’t on that release or higher, you will probably want to move. It cuts down the number of round trips needed to make a call.
Is anybody here having this issue running an earlier version than 11.2.200.18?
you are right. I don’t know future plans for DMT, but TA definitely does not need any UI.
Latency suspiction is understandable but afaik TA does not get huge responses from server, so compression/no compression sould not affect much in its case from latency stand point…
If it were async deadlock, then CPU would be low…
But I am sure Jeff will figure it out
I highly suspect it’s due to a specific .NET version incompatibility (or glitch) with .NET Core… The HttpClient implementation in Core is completely different, and I know of other cases where it caused issues. Probable 2023 is not having the issue because .NET 4.8 should be pretty much at parity with .NET 6. But 4.7.2 might not be… Not sure when the version on the client side was changed…
I also have it setup in a 2023.1 migration environment on premise, no issue was reported as of yet…
Theoretically, server is completely independent from client. I can have server on linux and php and yet still be able to create .NET client application.
Of course in this case our custom serialization is involved, but in this case I think it would fail on deserialization, not with compression.
4.7.2 was may years ago. Since we moved from WCF, client was already on 4.8 for a long time.
Compression is done in IIS outside .net and it’s independent of net version
The compression happens after dot net appserver returns the payload in the iis engine
The compression being an issue is not about compression itself, it seems it’s just highlighting a race condition or timing issue because it made things much faster.
Hence why when we introduced latency via fiddler or debugger problem disappears
Good news, it looks like we’ve found a solution. Still testing, but everything looks good. I was able to run a test application that pummeled the application server all weekend and had no issues. Usually, the test app would break in under 10 minutes.
This looks like an issue with the automatic decompression that .NET FW 4.8 is doing on the client. I turned off the automatic decompression and added code to do the decompression ourselves. The current guess is that it is a threading issue, which moving the decompression to a later point solves.
I will be adding this code to 2024.1 (11.2.500) as it is our current development branch. We are readying 2023.2 (11.2.400) for release soon, so I will probably add the code back there as well.