How is Delphi's codegen doing these days?

How is Delphi's codegen doing these days?

Having spent 5 years writing C++ and then coming back to Delphi last year, I was very curious about this. So I decided to write a small test.

I decided to port my Tiger hash[1] implementation, which I'd ported to C++ based on the original public domain implementation. There were two reasons for this. First, the hash function is non-trivial and I felt it is representative of integer-heavy calculations, however it also contains a lot of opportunities for compiler optimizations. Secondly, the hash function was developed to be fast on 64-bit CPUs, and I was curious to see how well the x64 Delphi compiler utilized the additional registries and such available on that platform.

While porting it to Delphi I also felt like seeing how it stacks up against it's main rivals. It took some time but I've now finally ported the program to C# and Java as well. For each platform I've tried to write as efficient as possible while keeping the port as close as possible to the original C++ version. I must admit I'm rusty in C# and Java and there might be some optimizations I've missed.

Benchmarking
The programs first run two self-checks to make sure the implementation is working. This also allows the jitter to do it's work in the C# and Java versions. I'm interested in throughput, so I did not want to include the jitting step. Then a 1 GB array is prepared with a fixed, pseudo-random content. This array is then hashed, and the resulting hash is verified. The time taken is used to calculate the throughput in MB/s.

Results
I ran the tests on my i5-3360M @ 2.8GHz, with multiple runs per version. The results were very consistent between runs. The results are:
C++:    491 MB/s
Java:   341 MB/s
Delphi: 268 MB/s (updated)
C#:     141 MB/s

Conclusion
That C++ took the lead is no big surprise, however I must admit I was slightly surprised at just how much faster it is. That Java performed so well was also slightly surprising. I've heard for a while that Java has improved in the performance department, and this shows it. I must admit I expected it to be closer to C#.

The Delphi performance was quite a shocker too, albeit I suspected the performance would not be good. However this is quite bad indeed. edit: After forcing a lot more inlining, performance increased by 30%. It's closer to Java, but still a fair ways behind.

After looking at the disassembly, I see there's a lot of low-hanging optimization fruits still to be claimed. For example the x64 compiler only uses a couple of the extra registers available on x64, causing it to very easily revert to full load/store cycles for each statement. I suspect this is one of the main causes for the lacklustre speed.

The C# results were not that surprising to me. Unless I've missed it, it lacks something similar to Java's ByteBuffer, which allowed me to access an array both as bytes and as 64bit ints directly. In C# I had to revert to copying the data instead.

Anyway, this was just a "for fun" project, but I thought the results might be interesting so decided to share.

Downloads
You can download the various versions using the Dropbox links below, each archive includes full source code and executable. The C++ and C# versions include VS 2012 project files, I used the free Visual Studio 2012 Express edition[2] for both. For Java I used NetBeans, and the Delphi version was made using XE3.

https://www.dropbox.com/s/3q94qms4sxgjd0w/TigerHashDelphi.zip
https://www.dropbox.com/s/hv55vl000fn5y6z/TigerHashCPP.zip
https://www.dropbox.com/s/zxmehwbdtsgto2n/TigerHashCS.zip
https://www.dropbox.com/s/9p5gn59rfmcfht8/TigerHashJava.zip

[1]: http://www.cs.technion.ac.il/~biham/Reports/Tiger/
[2]: http://www.microsoft.com/en-us/download/details.aspx?id=34673

Comments