So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.

- May 18, 2014

So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.

Target platform is of course 32bit.

Am I doomed to suffer the call overhead?

Comments

Asbjørn HeidMay 18, 2014 at 4:53 AM
On a side note, 64bit compiler sure is funny:
Project1.dpr.61: result := UInt64(a) * UInt64(b);
00000000004256D0 8BC1             mov eax,ecx
00000000004256D2 8BD2             mov edx,edx <--- ???
00000000004256D4 480FAFC2         imul rax,rdx

Oh well, not the worst offense :)
ReplyDelete
Replies
Kenneth CochranMay 18, 2014 at 6:33 AM
You mean:
{$ INLINE ON}
function UInt32x32To64(a,b: int32): Int64; inline;
begin
Result := a * Int64(b);
end;

isn't enough?
ReplyDelete
Replies
Asbjørn HeidMay 18, 2014 at 6:38 AM
Yes, on x86 platform it does a full 64x64 -> 64bit multiplication and truncates the result. This is highly inefficient compared to using the single instruction 32x32 -> 64bit multiplication.
ReplyDelete
Replies
Asbjørn HeidMay 18, 2014 at 6:40 AM
For reference, here's what the assembler version looks like:
function mul32to64(const a, b: UInt32): UInt64;
asm
mul edx
end;

Now the issue is that you cannot inline functions with asm...
ReplyDelete
Replies
Alexander BenikowskiMay 18, 2014 at 10:53 AM
Asbjørn Heid can you show the 64bit asm with and without optimizations? YOu highlighted the orphant move and I'd just like to see the difference.
ReplyDelete
Replies
Asbjørn HeidMay 18, 2014 at 11:15 AM
Alexander B. I'm not at my computer right now, but it's the pure pascal implementation of the function I'm after.

Think it's a pure register version of the usual unnecessary stack juggling.
ReplyDelete
Replies
Alexander BenikowskiMay 19, 2014 at 12:17 AM
Asbjørn Heid but that's how compiler sometimes/often work. they output stack juggling code and optimize it to register only(where applicable)
LLVM suggests these steps(as it contains internal optimizations which can be applied to it)
ReplyDelete
Replies
Asbjørn HeidMay 19, 2014 at 12:27 AM
Alexander B. But a decent compiler has passes to remove unnecessary instructions like this or the stack juggling. Delphi's compiler does it sometimes but it lacks a pass at the end, after inlining. If they are planning on keeping their x86/x64 compilers I hope they improve on this. I should be a relatively low hanging fruit.
ReplyDelete
Replies
Alexander BenikowskiMay 19, 2014 at 12:38 AM
Asbjørn Heid Of course it does. Never said it doesn't ;)
It just wanted to point out that those optimizations sometimes generate useless steps, for which you need an extra pass to remove those completely
ReplyDelete
Replies
Asbjørn HeidMay 19, 2014 at 12:46 AM
Alexander B. Ah yes then we're on the same page. Yes, it's easy to produce such output, but it hurts quite bad. In my experience the Delphi compiler could gain significant speed by just improving on this.
ReplyDelete
Replies

Add comment

Search This Blog

Delphi Developers Archive

So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.

Comments

Post a Comment