So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.
So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.
Target platform is of course 32bit.
Am I doomed to suffer the call overhead?
Target platform is of course 32bit.
Am I doomed to suffer the call overhead?
On a side note, 64bit compiler sure is funny:
ReplyDeleteProject1.dpr.61: result := UInt64(a) * UInt64(b);
00000000004256D0 8BC1 mov eax,ecx
00000000004256D2 8BD2 mov edx,edx <--- ???
00000000004256D4 480FAFC2 imul rax,rdx
Oh well, not the worst offense :)
You mean:
ReplyDelete{$ INLINE ON}
function UInt32x32To64(a,b: int32): Int64; inline;
begin
Result := a * Int64(b);
end;
isn't enough?
Yes, on x86 platform it does a full 64x64 -> 64bit multiplication and truncates the result. This is highly inefficient compared to using the single instruction 32x32 -> 64bit multiplication.
ReplyDeleteFor reference, here's what the assembler version looks like:
ReplyDeletefunction mul32to64(const a, b: UInt32): UInt64;
asm
mul edx
end;
Now the issue is that you cannot inline functions with asm...
Asbjørn Heid can you show the 64bit asm with and without optimizations? YOu highlighted the orphant move and I'd just like to see the difference.
ReplyDeleteAlexander B. I'm not at my computer right now, but it's the pure pascal implementation of the function I'm after.
ReplyDeleteThink it's a pure register version of the usual unnecessary stack juggling.
Asbjørn Heid but that's how compiler sometimes/often work. they output stack juggling code and optimize it to register only(where applicable)
ReplyDeleteLLVM suggests these steps(as it contains internal optimizations which can be applied to it)
Alexander B. But a decent compiler has passes to remove unnecessary instructions like this or the stack juggling. Delphi's compiler does it sometimes but it lacks a pass at the end, after inlining. If they are planning on keeping their x86/x64 compilers I hope they improve on this. I should be a relatively low hanging fruit.
ReplyDeleteAsbjørn Heid Of course it does. Never said it doesn't ;)
ReplyDeleteIt just wanted to point out that those optimizations sometimes generate useless steps, for which you need an extra pass to remove those completely
Alexander B. Ah yes then we're on the same page. Yes, it's easy to produce such output, but it hurts quite bad. In my experience the Delphi compiler could gain significant speed by just improving on this.
ReplyDelete