Does your app need a little speed boost? Take a look at how you can use SIMD assembly language to optimize critical parts of your Windows, macOS, iOS and Android apps.
Does your app need a little speed boost? Take a look at how you can use SIMD assembly language to optimize critical parts of your Windows, macOS, iOS and Android apps.
https://blog.grijjy.com/2017/07/10/simd-assembly-optimization/
https://blog.grijjy.com/2017/07/10/simd-assembly-optimization/
{$excessprecision off}
ReplyDeleteVitali Burkov but it is not nearly the same as the code in fastmath. The delphi compiler has near to nothing optimations when it comes to fast calculations
ReplyDeleteJust hinted the reason/solution of/for why 64 bit single FP performance is so poor
ReplyDeleteVitali Burkov Thanks for the EXCESSPRECISION note. I did disable that in my FastMath library but not in this demo app.
ReplyDeleteJust for testing, I added it to the demo app. It makes a big difference in the SquaredDistance calculation on Win64, but the SIMD version is still 2-3 times faster.
Vitali Burkov you are right, but why not tell in the first comment? And i must say i love the code from Erik.
ReplyDeleteDoes this article apply to C++ Builder as well, or are there major differences?
ReplyDeleteFWIW you can use the argument names in the assembly code which eliminates the need to have different code for 32 and 64 bit just because the registers are called differently - like this:
ReplyDeleteprocedure AddSIMD(const A, B: T16Bytes; out C: T16Bytes);
asm
movdqu xmm0, [A]
movdqu xmm1, [B]
paddb xmm0, xmm1
movdqu [C], xmm0
end;
Mike Versteeg I don't use C++ builder so take my answer with a grain of salt. I'm pretty sure you can link the ARM static libraries with C++ builder, so you take advantage of this for iOS and Android. The inline assembler for C++ builder will probably look/work differently, so you may have to reformat the code. But the principles should be the same.
ReplyDeleteAnother approach for Window and macOS (and Linux?) is the use an external assembler (like nasm or yasm) to create an object file and link that file. That will work with Delphi as well.
Stefan Glienke You are absolute right. There are 3 reasons I didn't do this though:
ReplyDelete1. To show more clearly the relationship between registers and parameters.
2. For consistency with the NEON/Arm64 code, where you cannot do that (although you could use #define's to link parameter names to registers).
3. But more importantly, because I shot myself in the foot a couple of times with that approach in longer routines. Since it makes the relationship between parameter and register less visible, it is easier to accidentally use the EAX, ECX and EDX registers for something else, not realizing that you are actually changing the value of a parameter. Accessing the parameter by name later will then lead to invalid results or crashes. So now I prefer to use registers to make it more clear what is happening. But you are right that in smaller routines like this, it is easier (and more readable) to use the parameter names instead.
Erik van Bilsen I couldn't agree more! About x64 compatibility proposed by Stefan Glienke it may work in some very simple cases, but definitively sounds like an unsafe trick: for instance the .noframe directive is missing, or you may miss something when switching from Win64 to Linux and their diverse ABI and register consumption.. Better make a clear separation between x86 and x64 asm, and always provide a "pure pascal" version.
ReplyDeleteI like very much the fact that with FPC, you can write inline asm functions in the middle of the source even on ARM (and ARM64), without external static linking of files, as with Delphi. It helps maintaining and debugging the code.
ReplyDelete