I hope a fix for this will be back-ported for many Delphi versions.

I hope a fix for this will be back-ported for many Delphi versions.

System.pas:

procedure YieldProcessor;
{$IF (defined(CPUX86) or defined(CPUX64)) and defined(ASSEMBLER)}
asm
PAUSE
end;
{$ELSE}
begin
end;
{$ENDIF}

Called from TMonitor.Spin, TThread.SpinWait and TInternalConditionVariable.LockQueue.

Duplicates in TMonitor.Spin TThread.SpinWait, TInternalConditionVariable.LockQueue and getmem.inc.



Originally shared by Kristian Köhntopp

The "Pause" instruction changed timing dramatically in Skylake. Spinlock implementation based on pause will need adjustments.
https://aloiskraus.wordpress.com/2018/06/16/why-skylakex-cpus-are-sometimes-50-slower-how-intel-has-broken-existing-code/

Comments

  1. Why would they start back porting now? Nothing ever gets back ported.

    But before anybody gets too excited, I think somebody would need to show how this might actually affect some real world programs.

    ReplyDelete
  2. Fix? I cannot even find the bug report... has this even been reported?

    ReplyDelete
  3. I cannot see any reports, (both internal and external) I can report it internally, but then the issue will not be visible to everyone...

    ReplyDelete
  4. Roy Nelson What would you report? Has anybody even shown that there is an issue with RTL spin lock code? And why tie this to Intel. What's the PAUSE latency like on AMD processors? Clue, it's different again.

    ReplyDelete
  5. David Heffernan if the performance hit is as bad as guy claims then obviously we would need to test or at the very least have it logged to see if we can replicate what the chap says... and if we do see the same issue in the RTL to hopefully be able to fix it?

    ReplyDelete
  6. Roy Nelson Nobody can report anything yet because nobody has produced any evidence that there is anything wrong. That's my point. This thread is a little pointless. Just because somebody somewhere wrote a crappy spin lock doesn't mean that the Delphi RTL spinlocks are also crappy. Not that I'd have much confidence in Delphi RTL synchronisation code but that's another matter and we should give it the benefit of the doubt at least.

    ReplyDelete
  7. Reading the comments in SynObjs.pas since Delphi XE "This type is modeled after a similar type available in .NET 4.0 as System.Threading." and this being a problem in .NET 4, there is now http://qc.embarcadero.com/wc/qcmain.aspx?d=144063

    ReplyDelete
  8. Jeroen Wiert Pluimers That's a very poor bug report because it is based entirely on speculation. It's also in the wrong place. It's meant to be Quality Portal.

    I am disappointed.

    ReplyDelete
  9. David Heffernan feel free to amend it or show in another way you do better.

    ReplyDelete
  10. Jeroen Wiert Pluimers The onus is on the submitter to submit good reports.

    ReplyDelete
  11. David Heffernan They did Backports of fixes from Tokyo to Berlin (and even seatle).
    They only thing the Delphi-RTL doesn't seem to suffer from is the multiply by processor issue. Most stuff looks quite similar to .Net (from a brief look). It has an exponential backoff and seems to trigger PAUSE a lot. If i had the CPU i'd love to test that right now.
    Except AMD Bobcat(2011) 6 and on Jaguar(2013) 46, AMD doesn't seem to have an extra latency added to Pause. But no Data for Ryzen found. So Intel does seem like the odd here, currently.
    agner.org - www.agner.org/optimize/instruction_tables.pdf

    ReplyDelete
  12. Alexander Benikowski AMD PAUSE can be 50 clocks I think

    ReplyDelete
  13. David Heffernan The Tables i linked say something in the line of 40+ Ops. But Steamroller from 2014 has only 8. So it varies. But no extra latency as in Intels PAUSE.

    ReplyDelete
  14. I guess FastMM4 is affected too, since the asm calls explicitly "pause" in its spinlocks. IMHO this may have a much bigger impact on multi-threaded apps on Skylake.

    ReplyDelete
  15. Good reports... bad reports... if they are reported in QC and have never been opened... they are all dead reports.

    ReplyDelete
  16. The only 2 "pause" instructions in FastMM 4.991 are used when NeverSleepOnThreadContention is active. One in FastGetMem, another in FastFreeMem. As long as you keep this disabled, it should be OK.

    ReplyDelete
  17. I have added this as an internal report(RS-88454), copying Jeroen Wiert Pluimers' qc text... So we will keep an eye on this...

    ReplyDelete
  18. Why is it even possible to add new reports to QC?

    ReplyDelete
  19. Alexandre Machado In the cut-down version embedded in Delphi (not the full FastMM4) there is no such conditional and `pause` is always executed in the asm. So the problem may occur in 99% of Win32/Win64 Delphi programs (i.e. the ones not compiled with external FastMM4), for every and each multi-threaded memory allocation. So it is NOT OK at all. :(

    ReplyDelete
  20. A. Bouchez Where did you get the "99%" number from? I don't know a single Delphi application built with the built in memory manager. Unless we are talking about "Hello World" type of applications. So, in my case 100% of Delphi applications I know are safe.

    ReplyDelete
  21. Alexandre Machado So you are in the 1%. :) Of course, this was a guess, mostly probably wrong, but since there is (was) no benefit of using FastMM4 instead of the built-in memory manager which is a cut-down version of FastMM4. At least since Delphi 2006 when it was introduced IIRC. From all companies I worked for, or audited in, they use the built-in heap, and only used FastMM4 for full debug mode. The Delphi IDE itself doesn't use FastMM4, but only the cut-down internal version, which sadly uses pause. Of course, it is not heavily multi-threaded, so I guess it won't affect its speed. :)

    ReplyDelete
  22. I understand that a lot of people use the full version of FastMM4 because they want that extra debug functionality but I've never needed it personally and I would bet most companies use the built-in fastMM. Would love to see some statistics tho, are there any?

    ReplyDelete
  23. I'm with A. Bouchez. Most companies I know only use full FastMM for the debug stuff.

    ReplyDelete

Post a Comment