A Look at Improved Inlining in Delphi XE6

A Look at Improved Inlining in Delphi XE6 - DelphiTools

- May 07, 2014

http://www.delphitools.info/2014/05/07/a-look-at-improved-inlining-in-delphi-xe6

Comments

Bill MeyerMay 7, 2014 at 4:36 AM
Ah, but incompatible with that old chip, I'm afraid! ;)
ReplyDelete
Replies
Asbjørn HeidMay 7, 2014 at 4:40 AM
Thanks for the analysis. One step forwards and one step backwards. Overall still pretty poor :(

With records it seems to do the juggling on the result variable, but not on the parameters, so that's an improvement. But the extra float juggling is still there of course. Here's from a simple 2D vector record with an inlined Add operator:

XE3.dpr.40: vr := v1 + v2;
0041C427 8B05D83E4200     mov eax,[$00423ed8]
0041C42D 8945E8           mov [ebp-$18],eax
0041C430 8B05DC3E4200     mov eax,[$00423edc]
0041C436 8945EC           mov [ebp-$14],eax
0041C439 8B05E03E4200     mov eax,[$00423ee0]
0041C43F 8945E0           mov [ebp-$20],eax
0041C442 8B05E43E4200     mov eax,[$00423ee4]
0041C448 8945E4           mov [ebp-$1c],eax
0041C44B D945E8           fld dword ptr [ebp-$18]
0041C44E D845E0           fadd dword ptr [ebp-$20]
0041C451 D95DD8           fstp dword ptr [ebp-$28]
0041C454 9B               wait
0041C455 D945EC           fld dword ptr [ebp-$14]
0041C458 D845E4           fadd dword ptr [ebp-$1c]
0041C45B D95DDC           fstp dword ptr [ebp-$24]
0041C45E 9B               wait
0041C45F 8B45D8           mov eax,[ebp-$28]
0041C462 8905E83E4200     mov [$00423ee8],eax
0041C468 8B45DC           mov eax,[ebp-$24]
0041C46B 8905EC3E4200     mov [$00423eec],eax

XE6.dpr.40: vr := v1 + v2;
0041C4AF D905BC3E4200     fld dword ptr [$00423ebc]
0041C4B5 D805C43E4200     fadd dword ptr [$00423ec4]
0041C4BB D95DDC           fstp dword ptr [ebp-$24]
0041C4BE 9B               wait
0041C4BF D945DC           fld dword ptr [ebp-$24]
0041C4C2 D95DE8           fstp dword ptr [ebp-$18]
0041C4C5 9B               wait
0041C4C6 D905C03E4200     fld dword ptr [$00423ec0]
0041C4CC D805C83E4200     fadd dword ptr [$00423ec8]
0041C4D2 D95DDC           fstp dword ptr [ebp-$24]
0041C4D5 9B               wait
0041C4D6 D945DC           fld dword ptr [ebp-$24]
0041C4D9 D95DEC           fstp dword ptr [ebp-$14]
0041C4DC 9B               wait
0041C4DD 8B45E8           mov eax,[ebp-$18]
0041C4E0 8905CC3E4200     mov [$00423ecc],eax
0041C4E6 8B45EC           mov eax,[ebp-$14]
0041C4E9 8905D03E4200     mov [$00423ed0],eax
ReplyDelete
Replies
Leif UneusMay 7, 2014 at 4:41 AM
I just ran a comparison between XE5 and XE6 with the SciMark2 test here:
https://code.google.com/p/scimark-delphi/

XE6 Win32 Results:

Mininum running time = 2,00 seconds
Composite Score MFlops: 632,06
FFT Mflops: 297,35 (N=1024)
SOR Mflops: 895,01 (100 x 100)
MonteCarlo: Mflops: 184,05
Sparse matmult Mflops: 360,58 (N=1000, nz=5000)
LU Mflops: 1423,33 (M=100, N=100)

XE5 Win32 Results:

Mininum running time = 2,00 seconds
Composite Score MFlops: 859,98
FFT Mflops: 390,91 (N=1024)
SOR Mflops: 1193,53 (100 x 100)
MonteCarlo: Mflops: 198,91
Sparse matmult Mflops: 538,50 (N=1000, nz=5000)
LU Mflops: 1978,03 (M=100, N=100)
ReplyDelete
Replies
Eric GrangeMay 7, 2014 at 5:57 AM
Bill Meyer Well, not that incompatible actually, XE6 still generates x87 FPU opcodes... at least they're 32bit x87 opcodes, but hey, SSE was only introduced 13 years ago!
ReplyDelete
Replies
Eric GrangeMay 7, 2014 at 6:10 AM
Leif Uneus I can confirm that between XE & XE6, with 480 MPFlops for XE vs 350 for XE6 (on my old AMD CPU), and similar ratio on more recent Xeon E5.
ReplyDelete
Replies
Lars FosdalMay 7, 2014 at 6:44 AM
A decline in performance?
ReplyDelete
Replies
Eric GrangeMay 7, 2014 at 6:45 AM
Ok, I know why XE6 is slowser: it is now stack juggling even for simple expressions...
ReplyDelete
Replies
Lars FosdalMay 7, 2014 at 6:45 AM
Woah...
ReplyDelete
Replies
Bill MeyerMay 7, 2014 at 7:02 AM
Lars Fosdal Oh, come on, speed is overrated. ;)
ReplyDelete
Replies
Lars FosdalMay 7, 2014 at 7:04 AM
Bill Meyer - I prefer my mistakes to be fast ;)
ReplyDelete
Replies
Eric GrangeMay 7, 2014 at 7:14 AM
Slowdown ranges from 7% (integer-heavy Monte Carlo) to 50% (sparse matrix multiplication), with 30% on average.

I mean, the CPU speed will have caught up within a year or five, who cares?
ReplyDelete
Replies
Dalija PrasnikarMay 7, 2014 at 7:22 AM
Eric Grange So you need new hardware in order to just recompiled apps run as fast as they did before. Talk about progress...
ReplyDelete
Replies
Asbjørn HeidMay 7, 2014 at 7:28 AM
Well, at least they did some work on optimization... so there's a faint hope they'll fix this and the result juggling and such.

Though, they clearly need to run the output past a few more eyeballs, as well as running more regression tests before release.
ReplyDelete
Replies
Eric GrangeMay 7, 2014 at 7:28 AM
Dalija Prasnikar That's consumerism for you ;-)
ReplyDelete
Replies
Bill MeyerMay 7, 2014 at 7:53 AM
Dalija Prasnikar Except that CPU speed is not increasing, and multi-core only helps when you can improve performance through threading.
ReplyDelete
Replies
Eric GrangeMay 7, 2014 at 7:57 AM
Asbjørn Heid It's a bit surprising they didn't use all the open-source benchmarks and/or call for sample code before doing it (ala FastCode B&V).
I hope it's not the old Borland Ivory Tower culture striking back.
ReplyDelete
Replies
Bill MeyerMay 7, 2014 at 7:59 AM
Eric Grange Is it? Seems like NIH...
ReplyDelete
Replies
Asbjørn HeidMay 7, 2014 at 8:54 AM
Eric Grange Given the job postings for compiler jobs in Romania, I hope it's just inexperience...
ReplyDelete
Replies
Mohammed NasmanMay 8, 2014 at 3:01 AM
Marco Cantù Attention please.
ReplyDelete
Replies
Bill MeyerMay 8, 2014 at 3:52 AM
Lars Fosdal Rapidly in error? ;)
ReplyDelete
Replies
Haofu HuangMay 9, 2014 at 8:30 AM
http://www.delphitools.info/2014/05/08/delphi-xe6-32bits-and-scimark/
ReplyDelete
Replies

Add comment

Search This Blog

Delphi Developers Archive

A Look at Improved Inlining in Delphi XE6 - DelphiTools

Comments

Post a Comment