Why does the first snippet generate an access violation and second not?
Why does the first snippet generate an access violation and second not?
**
class operator TSlice.BitwiseAnd(const A: TSlice; const B: TSliceData): TSlice;
asm
//RCX = result
//RDX = @A
//R8 = @B
movdqu xmm0,[rdx]
movdqu xmm1,[rdx+16]
pand xmm0,[r8]
pand xmm1,[r8+16]
movdqu [rcx],xmm0
movdqu [rcx+16],xmm1
end;
class operator TSlice.BitwiseOr(const A, B: TSlice): TSlice;
asm
//RCX = result
//RDX = @A
//R8 = @B
movdqu xmm0,[rdx]
movdqu xmm1,[rdx+16]
movdqu xmm2,[r8]
movdqu xmm3,[r8+16]
por xmm0,xmm2
por xmm1,xmm3
movdqu [rcx],xmm0
movdqu [rcx+16],xmm1
end;
**
Answer:...
Alignment, a por/pand etc memory access must be aligned.
Very annoying.
**
class operator TSlice.BitwiseAnd(const A: TSlice; const B: TSliceData): TSlice;
asm
//RCX = result
//RDX = @A
//R8 = @B
movdqu xmm0,[rdx]
movdqu xmm1,[rdx+16]
pand xmm0,[r8]
pand xmm1,[r8+16]
movdqu [rcx],xmm0
movdqu [rcx+16],xmm1
end;
class operator TSlice.BitwiseOr(const A, B: TSlice): TSlice;
asm
//RCX = result
//RDX = @A
//R8 = @B
movdqu xmm0,[rdx]
movdqu xmm1,[rdx+16]
movdqu xmm2,[r8]
movdqu xmm3,[r8+16]
por xmm0,xmm2
por xmm1,xmm3
movdqu [rcx],xmm0
movdqu [rcx+16],xmm1
end;
**
Answer:...
Alignment, a por/pand etc memory access must be aligned.
Very annoying.
David Heffernan Indeed. But in this context, last time I checked, using movdqu has no big performance penalty over movdqa on modern CPUs, unless you process a big data buffer at once. There was a penalty 15 years ago with a Pentium IV - see https://software.intel.com/en-us/articles/reducing-the-impact-of-misaligned-memory-accesses - but much less today. I wonder if inlining plain 64-bit OR/AND pascal code wouldn't be faster than an asm sub-function using SSE2 registers, in this particular case.
ReplyDeleteA. Bouchez maybe not in this case, but when the compiler can't align double precision values on the stack that's pretty annoying.
ReplyDeleteDavid Heffernan Here values are passed by reference (using the const keyword, which passes pointers on the stack), so alignement is not done by the compiler, but by the caller. If you pass double values as parameter, they will be affected on XMM registers, then when there is no register left (very unlikely) aligned on the stack.
ReplyDelete