I'm profiling some code trying to improve its abismal performance. I'm using AsmProfiler (https://github.com/andremussche/asmprofiler). The problem is, when running my program with the profiling code in place, it runs way faster than when I compile it using Release settings.

I'm profiling some code trying to improve its abismal performance. I'm using AsmProfiler (https://github.com/andremussche/asmprofiler). The problem is, when running my program with the profiling code in place, it runs way faster than when I compile it using Release settings.

Of course, I need to check the configuration for each profile, but, how can this be possible at all?

Comments

  1. When you run with profiling code in place, do you have extra fields in your classes or records? Memory alignment issues can slow down access by 50% so it could be that your arrays are different in your release version?

    ReplyDelete
  2. Lars Fosdal I don't know where your 50% comes from but alignment is clearly not the root cause of huge slowdown in 95.37% of applications which do not involve FP calculation.
    ;-)

    ReplyDelete
  3. Did you try to disable your anti virus? Or try on another computer?

    ReplyDelete
  4. A. Bouchez I'll post a benchmark when I'm back at my computer. Accessing misaligned data is costly.

    ReplyDelete
  5. A. Bouchez - The article: delphi.fosdal.com - Optimization - Understand your hardware - It seems that link rot has eaten the source code, but I am searching. Anyways - the methodology is very simple and described in the article.

    ReplyDelete
  6. The demo needs a revisal to use TStopWatch :P

    ReplyDelete
  7. Misaligned memory access used to be costly. Modern Intel desktop processors have changed that.

    ReplyDelete
  8. Ok, after running this - it seems that the memory bottleneck is pretty much ignorable on a 2016 CPU compared to a 2008 CPU - so the problem definitively lies elsewhere.

    Total Avg High Low
    A0 0/ 25: 3654ms 730.8ms 761ms 717ms
    A1 1/ 25: 3679ms 735.8ms 768ms 724ms
    A2 2/ 25: 3673ms 734.6ms 767ms 716ms
    A3 3/ 25: 3679ms 735.8ms 769ms 712ms
    A4 4/ 25: 3604ms 720.8ms 748ms 702ms
    A5 5/ 25: 3677ms 735.4ms 767ms 717ms
    A6 6/ 25: 3680ms 736.0ms 766ms 716ms
    A7 7/ 25: 3673ms 734.6ms 762ms 712ms
    A8 8/ 25: 3611ms 722.2ms 750ms 704ms
    A9 9/ 25: 3676ms 735.2ms 764ms 715ms
    A10 10/ 25: 3671ms 734.2ms 767ms 708ms
    A11 11/ 25: 3678ms 735.6ms 768ms 714ms
    A12 12/ 25: 3593ms 718.6ms 746ms 698ms
    A13 13/ 25: 3678ms 735.6ms 765ms 718ms
    A14 14/ 25: 3697ms 739.4ms 770ms 719ms
    A15 15/ 25: 3678ms 735.6ms 762ms 719ms
    A16 16/ 25: 3608ms 721.6ms 749ms 700ms
    A17 17/ 25: 3689ms 737.8ms 765ms 724ms
    A18 18/ 25: 3721ms 744.2ms 776ms 723ms
    A19 19/ 25: 3694ms 738.8ms 790ms 714ms
    A20 20/ 25: 3677ms 735.4ms 786ms 714ms
    A21 21/ 25: 3717ms 743.4ms 788ms 721ms
    A22 22/ 25: 3698ms 739.6ms 775ms 717ms
    A23 23/ 25: 3685ms 737.0ms 769ms 715ms
    A24 24/ 25: 3614ms 722.8ms 750ms 707ms
    A25 25/ 25: 3700ms 740.0ms 784ms 719ms

    ReplyDelete
  9. Thank you. I did some low-hanging fruit optimizations and I'm very happy. The source of it was that I was copying FastReport reports before using them on each row of a large dataset (it uses XML serialization to copy); just some caching on memory and the biggest performance hit disappeared. Still don't know why the very observable performance improvement when using asmProfiler was there.

    ReplyDelete

Post a Comment