I'm profiling some code trying to improve its abismal performance. I'm using AsmProfiler (https://github.com/andremussche/asmprofiler). The problem is, when running my program with the profiling code in place, it runs way faster than when I compile it using Release settings.
I'm profiling some code trying to improve its abismal performance. I'm using AsmProfiler (https://github.com/andremussche/asmprofiler). The problem is, when running my program with the profiling code in place, it runs way faster than when I compile it using Release settings.
Of course, I need to check the configuration for each profile, but, how can this be possible at all?
Of course, I need to check the configuration for each profile, but, how can this be possible at all?
When you run with profiling code in place, do you have extra fields in your classes or records? Memory alignment issues can slow down access by 50% so it could be that your arrays are different in your release version?
ReplyDeleteLars Fosdal I don't know where your 50% comes from but alignment is clearly not the root cause of huge slowdown in 95.37% of applications which do not involve FP calculation.
ReplyDelete;-)
Did you try to disable your anti virus? Or try on another computer?
ReplyDeleteA. Bouchez I'll post a benchmark when I'm back at my computer. Accessing misaligned data is costly.
ReplyDeleteA. Bouchez - The article: delphi.fosdal.com - Optimization - Understand your hardware - It seems that link rot has eaten the source code, but I am searching. Anyways - the methodology is very simple and described in the article.
ReplyDeleteFound it and put a copy on G-drive!
ReplyDeletedrive.google.com - DataAlignmentDemo.zip
The demo needs a revisal to use TStopWatch :P
ReplyDeleteMisaligned memory access used to be costly. Modern Intel desktop processors have changed that.
ReplyDeleteOk, after running this - it seems that the memory bottleneck is pretty much ignorable on a 2016 CPU compared to a 2008 CPU - so the problem definitively lies elsewhere.
ReplyDeleteTotal Avg High Low
A0 0/ 25: 3654ms 730.8ms 761ms 717ms
A1 1/ 25: 3679ms 735.8ms 768ms 724ms
A2 2/ 25: 3673ms 734.6ms 767ms 716ms
A3 3/ 25: 3679ms 735.8ms 769ms 712ms
A4 4/ 25: 3604ms 720.8ms 748ms 702ms
A5 5/ 25: 3677ms 735.4ms 767ms 717ms
A6 6/ 25: 3680ms 736.0ms 766ms 716ms
A7 7/ 25: 3673ms 734.6ms 762ms 712ms
A8 8/ 25: 3611ms 722.2ms 750ms 704ms
A9 9/ 25: 3676ms 735.2ms 764ms 715ms
A10 10/ 25: 3671ms 734.2ms 767ms 708ms
A11 11/ 25: 3678ms 735.6ms 768ms 714ms
A12 12/ 25: 3593ms 718.6ms 746ms 698ms
A13 13/ 25: 3678ms 735.6ms 765ms 718ms
A14 14/ 25: 3697ms 739.4ms 770ms 719ms
A15 15/ 25: 3678ms 735.6ms 762ms 719ms
A16 16/ 25: 3608ms 721.6ms 749ms 700ms
A17 17/ 25: 3689ms 737.8ms 765ms 724ms
A18 18/ 25: 3721ms 744.2ms 776ms 723ms
A19 19/ 25: 3694ms 738.8ms 790ms 714ms
A20 20/ 25: 3677ms 735.4ms 786ms 714ms
A21 21/ 25: 3717ms 743.4ms 788ms 721ms
A22 22/ 25: 3698ms 739.6ms 775ms 717ms
A23 23/ 25: 3685ms 737.0ms 769ms 715ms
A24 24/ 25: 3614ms 722.8ms 750ms 707ms
A25 25/ 25: 3700ms 740.0ms 784ms 719ms
Thank you. I did some low-hanging fruit optimizations and I'm very happy. The source of it was that I was copying FastReport reports before using them on each row of a large dataset (it uses XML serialization to copy); just some caching on memory and the biggest performance hit disappeared. Still don't know why the very observable performance improvement when using asmProfiler was there.
ReplyDelete