Strange, I did some hacking and profiling to see how fast I could get Datasnap, and I see that also my "Plain Indy" is much slower in XE3 than D2010! In D2010 is get about 11.500 request per second, the same in XE3 only 7.700...

- February 04, 2013

Strange, I did some hacking and profiling to see how fast I could get Datasnap, and I see that also my "Plain Indy" is much slower in XE3 than D2010! In D2010 is get about 11.500 request per second, the same in XE3 only 7.700...
After some hacking (mainly disabling the sessions) I got 4700 req/s (was 3200reg/s) and also stable performance (no steep decline). I could not get more without much rework...

When looking at the DS source code my conclusion is: it is not optimized for "high performance" (stupid advertisement of EMBT!). I mean: all kinds of helper objects are created and destroyed on the fly, many UTF8 decoding conversions (implicit due to rtti?), RTTI context is not cached, no connection pool (new connections are created and closed for each request), etc.

Anyway: does anyone know why Indy in XE3 is slower?

Comments

Roberio PracianoFebruary 4, 2013 at 6:02 PM
But i have a question? Is estable in 2010, have crech?
ReplyDelete
Replies
André MusscheFebruary 4, 2013 at 10:37 PM
Roberio Praciano Do you mean if DataSnap is stable in D2010? I don't know, I normally use RO (RemObjects)
ReplyDelete
Replies
André MusscheFebruary 4, 2013 at 10:45 PM
Michael Thuma Hmm maybe, mORMot even didn't work in XE3 due to some alignment issue(?)...
I only need some time to dig into it :)
ReplyDelete
Replies
André MusscheFebruary 5, 2013 at 12:37 AM
It's getting stranger: when testing with RTC stess tool (http://www.realthinclient.com/sdkarchive/indexb4ac.html?cmd=viewtopic&topic_id=80&section_id=3&sid=) the results are opposite: in D2010 I only get about 2200req/s and 7800reg/s for the XE3 version. And now "my" optimized version of DataSnap can do 7000req/s! (default 2400req/s).
Results seems to be very tied to how the client handles/does the requests... (with JMeter and WinXP and debugging in XE3 I get read timeouts with keep-alive)
ReplyDelete
Replies
André MusscheFebruary 5, 2013 at 1:22 AM
Michael Thuma the "read timeout" also occured with single thread JMeter. Somehow JMeter does not detect when the response has been received (CRLF, ReadLn?) and waits till the "response timeout" (e.g. 500ms). But on Win7 and keep-alive it works without problems. Without keep-alive the connection is closed each time, so end of response. Anyway, networking is not very straight forward... (different kind of behaviours etc)
ReplyDelete
Replies
Marco CantùFebruary 5, 2013 at 1:29 AM
I'm interested in any findings. We are going to put a real effort into optimizing DataSnap in the coming months, adjusting Indy settings (we don't have optimal defaults), and cleaning up some code. I agree we have many nice features, but "high performance" is not there. Yet.
ReplyDelete
Replies
André MusscheFebruary 5, 2013 at 1:46 AM
Marco Cantù Good to hear! My list of changes so far:
- disabling session management: DS already handles "no sessions" very well, I only commented out the following lines:
- TDSHTTPServer.DoDSRESTCommand -> LoadSessionUpdate
- modified "GetInvocationMetadata" function (default CreateIfNil = false)

Small hacks for more "caching" and less create/free:
- made "RespHandler" not local (within function) but more global (within TDSHTTPServer)
- keeping IIPPeerProcs in global var
- keeping DBXConnection in threadvar

I don't know if this stuff has side effects but I was only interested in simple hacks for quick results :)

Maybe it is a good idea to not use "IdCustomHTTPServer" but a more light version of it? Now it does all kinds of parsing (headers etc) which is nice and easy for programming but bad for performance (even in .Net (server) and Javascript(games) they say: don't use string.split() etc but just scanning a string with pos() etc so you don't have string/memory allocations!)

By the way: will there be a concurrent version of FastMM for multithreaded high performance services? Because I got the most speed boost when using my ScaleMM2 or Google TCalloc.
ReplyDelete
Replies
Marco CantùFebruary 5, 2013 at 1:50 AM
We want to optimize but not in exchange for features. Making features (like session management) customizable is certainly an option.
I'll pass some of these suggestions. For FastMM, have you turned on multi-threading support? This is not active by default... not that we should consider alternatives, of course.
ReplyDelete
Replies
Eric GrangeFebruary 5, 2013 at 1:57 AM
IME with http.sys servers, JMeter is actually the bottleneck if you don't have many physical clients: it's eating loads CPU cycles and memory, and can seriously hamstring a server running on the same machine. So there is a risk the test becomes more one of whether the server can hold its own on a busy machine.

ApacheBench (ab) can go higher and is more CPU efficient, but limited to a single thread, so requires multiple clients and manually collating data if you want to really stress a good server, and if you run it on the same machine as the server, it can still end up hampering the server performance.

There is a multi-threaded variant of ab that is named weighttp (http://redmine.lighttpd.net/projects/weighttp/wiki) and that can really pump up the requests (though like ab, it's a command line tool and not as nice as JMeter).

IME the req/s that can be observed with weighttp vs http.sys server (for 100% CPU use by the server) is quite high. Other servers can already be made to reach 100% CPU with ab, and only the slowest ones would reach 100% CPU with JMeter.
ReplyDelete
Replies
André MusscheFebruary 5, 2013 at 2:12 AM
Marco Cantù What do you mean with FastMM and multi-threading support? AssumeMultiThreaded, NeverSleepOnThreadContention and UseSwitchToThread? These settings offer only a slight improvement with heavy MM bound operations (like utf8tostring conversions, string copy(), object create/free etc) because it still uses a global lock on each size. And because most stuff have small sizes (small strings, small objects etc) they very soon lock each other.
Please, Delphi really needs a concurrent MM otherwise Delphi/DataSnap will never be high performance with many threads (or you must use many low level optimizations like no object creations, string manipulations etc like mORMot does).
Don't get me wrong: FastMM is the best on single threaded (or "occasional" multi threaded) but not for heavy multithreaded and MM bound stuff.
ReplyDelete
Replies
Marco CantùFebruary 5, 2013 at 2:31 AM
I was asking, I have a limited direct experience on that. If there is a significant speed gain in changing the memory manager, this is something we can certainly consider. Saw the ScaleMM page (wasn't aware of it) and it sounds interesting. When you mention you get improvements using it for DataSnap do you have any actual numbers? (if you prefer to move this conversation over email, that's OK for me)
ReplyDelete
Replies
André MusscheFebruary 5, 2013 at 2:52 AM
Marco Cantù Quick snap of some numbers from my test (http://andremussche.blogspot.nl/2013/01/datasnap-ro-rtc-mormot-wcf-node-speed.html):
- DataSnap with normal/internal/fastmm = 1377 req/s,
- DataSnap with SMM2 = 3500req/s (but due to some problems with sessions it drops very steep to 1700 after a couple of seconds(!))
- DataSnap with SMM2 and without sessions = 4600 req/s

But these numbers are from JMeter. Did some quick test with RTC stress tool and then I got:
- 1200 with internal fastmm (note: cpu does not get higher than 30% on my quad core which means it is bound on the fastmm lock, it is not purely MM bound because in other MM tests I only got 24/25% which is only one core)
- 2400 with SMM2 (note: this the average after about 10s, initially it has more req/s but drops steep)
- 7000 with SMM2 + no sessions (note: no performance drop but constant)

As you see, the numbers also vary a lot with the tools that are used (only done quick localhost test so the numbers are only a indication).
ReplyDelete
Replies
André MusscheFebruary 5, 2013 at 4:22 AM
Danijel Tkalcec Thanks! Can the tool also store its settings? Or even load/save to different files (for different setups/configurations)?
ReplyDelete
Replies

Add comment

Search This Blog

Delphi Developers Archive

Strange, I did some hacking and profiling to see how fast I could get Datasnap, and I see that also my "Plain Indy" is much slower in XE3 than D2010! In D2010 is get about 11.500 request per second, the same in XE3 only 7.700...

Comments

Post a Comment