So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.

So it's been a while, and I've forgotten... is there some function / trick to getting an inline-able version of UInt32x32To64? That is, multiply two UInt32's and return the result as an UInt64.

Target platform is of course 32bit.

Am I doomed to suffer the call overhead?

Comments

  1. On a side note, 64bit compiler sure is funny:
    Project1.dpr.61: result := UInt64(a) * UInt64(b);
    00000000004256D0 8BC1             mov eax,ecx
    00000000004256D2 8BD2             mov edx,edx  <--- ???
    00000000004256D4 480FAFC2         imul rax,rdx

    Oh well, not the worst offense :)

    ReplyDelete
  2. You mean:
    {$ INLINE ON}
    function UInt32x32To64(a,b: int32): Int64; inline;
    begin
      Result := a * Int64(b);
    end;

    isn't enough?

    ReplyDelete
  3. Yes, on x86 platform it does a full 64x64 -> 64bit multiplication and truncates the result. This is highly inefficient compared to using the single instruction 32x32 -> 64bit multiplication.

    ReplyDelete
  4. For reference, here's what the assembler version looks like:
    function mul32to64(const a, b: UInt32): UInt64;
    asm
      mul edx
    end;


    Now the issue is that you cannot inline functions with asm...

    ReplyDelete
  5. Asbjørn Heid can you show the 64bit asm with and without optimizations? YOu highlighted the orphant move and I'd just like to see the difference.

    ReplyDelete
  6. Alexander B. I'm not at my computer right now, but it's the pure pascal implementation of the function I'm after.

    Think it's a pure register version of the usual unnecessary stack juggling.

    ReplyDelete
  7. Asbjørn Heid but that's how compiler sometimes/often work. they output stack juggling code and optimize it to register only(where applicable)
    LLVM suggests these steps(as it contains internal optimizations which can be applied to it)

    ReplyDelete
  8. Alexander B. But a decent compiler has passes to remove unnecessary instructions like this or the stack juggling. Delphi's compiler does it sometimes but it lacks a pass at the end, after inlining. If they are planning on keeping their x86/x64 compilers I hope they improve on this. I should be a relatively low hanging fruit.

    ReplyDelete
  9. Asbjørn Heid  Of course it does. Never said it doesn't ;)
    It just wanted to point out that those optimizations sometimes generate useless steps, for which you need an extra pass to remove those completely

    ReplyDelete
  10. Alexander B. Ah yes then we're on the same page. Yes, it's easy to produce such output, but it hurts quite bad. In my experience the Delphi compiler could gain significant speed by just improving on this.

    ReplyDelete

Post a Comment