I was wondering if there is a reason why setting multiple array values on the same array consecutively results in repeating that one instruction over and over.
I was wondering if there is a reason why setting multiple array values on the same array consecutively results in repeating that one instruction over and over.
No I don't want to start another "omg, the optimization sucks" discussion just if there is a reason why this obvious (and easy?) peephole optimization was not done (remember, I suck at compiler knowledge ^^) - apart from the cough aforementioned reason. ;)
procedure Main;
var
a: TArray;
begin
SetLength(a, 2);
a[0] := 1;
a[1] := 2;
end;
This code results in this asm (same situation on x64):
a[0] := 1;
mov eax,[ebp-$04]
mov [eax],$00000001
a[1] := 2;
mov eax,[ebp-$04]
mov [eax+$04],$00000002
No I don't want to start another "omg, the optimization sucks" discussion just if there is a reason why this obvious (and easy?) peephole optimization was not done (remember, I suck at compiler knowledge ^^) - apart from the cough aforementioned reason. ;)
procedure Main;
var
a: TArray
begin
SetLength(a, 2);
a[0] := 1;
a[1] := 2;
end;
This code results in this asm (same situation on x64):
a[0] := 1;
mov eax,[ebp-$04]
mov [eax],$00000001
a[1] := 2;
mov eax,[ebp-$04]
mov [eax+$04],$00000002
Probably the modern CPU's are smarter than the Delphi compiler and just throw away the excessive instructions from the pipeline; and yes, the optimization sucks, though (because CPU's are smart) it is not obvious whether it affects performance or not.
ReplyDeleteStefan Glienke, what to you looks like the same instruction to me does not. By which I mean it has a different result the second time. What is that you were expecting it to optimise to?
ReplyDeleteAndrea Raimondi I assume he was expecting the second instance of "mov eax,[ebp-$04]" to not be there, since neither eax nor ebp has changed since the first time.
ReplyDeletealso setlength calls (useless in this case) zeromem (I submitted a qc about a flag to skip zeromem)
ReplyDeleteSergey Kasandrov CPU can't ignore the duplicate instruction in this case, because it reads from memory. If it had been say "mov eax, immediate" then sure.
ReplyDeleteUsing pointermath on, coding like this will get rid of the extra mov instruction:
ReplyDelete{$POINTERMATH ON}
procedure Main;
var
a: TArray;
P: PInteger;
begin
SetLength(a, 2);
P := Pointer(a);
P[0] := 1;
P[1] := 2;
end;
Project72.dpr.17: P := Pointer(a);
0041A23B 8B45FC mov eax,[ebp-$04]
Project72.dpr.18: P[0] := 1;
0041A23E C70001000000 mov [eax],$00000001
Project72.dpr.19: P[1] := 2;
0041A244 C7400402000000 mov [eax+$04],$00000002
Leif Uneus Now that is funny and now I wonder even more why for array access it doesn't do exactly that.
ReplyDeleteWhat prevents the compiler to allocate a register to your array variable is because SetLength() is using it as a "var" param, and anything used as a "var" param has to have a memory location, so the "a" variable is flagged as not eligible to optimization into a register.
ReplyDelete(and AFAICT once a variable is flagged, it is flagged for the whole procedure, this would be a limitation of the optimizer)
As a workaround, you can just use a different variable:
var
a, b: TArray;
begin
SetLength(b, 2);
a := b;
a[0] := 1;
a[1] := 2;
end;
though that will stress the implicit refcounting, and there can be side-effects to be wary of since SetLength behaves different when an array as a refcount > 1, so you could either use a PIntegerArray or segregate the SetLength to a different function:
var
a, b: TArray;
begin
a := NewIntegerArray(2);
a[0] := 1;
a[1] := 2;
end;
NewIntegerArray() has to be a non-inlined function, and you cannot use the "array constructors", otherwise it can pull-in an implicit SetLength, which would get you back to step one.
Another optimizer limitation: using array record helpers (Test vs TestPointer)
ReplyDeletetype
TMyArray=Array of Integer;
TMyArrayHelper=record helper for TMyArray
public
procedure Test;
procedure TestPointer;
end;
{ TMyArrayHelper }
procedure TMyArrayHelper.Test;
begin
Self[0]:=1;
Self[1]:=2;
end;
{$POINTERMATH ON}
procedure TMyArrayHelper.TestPointer;
var P : PInteger;
begin
P:=PInteger(Self);
P[0]:=1;
P[1]:=2;
end;
{$POINTERMATH OFF}
var a : TMyArray;
SetLength(a,2);
a.Test;
a.TestPointer;
David Berneda array record helpers themselves are not the issue, it's passing the variable as "var" param (for SetLength or anything else)
ReplyDeleteEric Grange I think that is not working, now I got this - could it be the way Delphi treats managed type results as hidden var parameter that is causing the same as SetLength?
ReplyDeletea := NewIntegerArray(2);
lea edx,[ebp-$04]
mov eax,$00000002
call NewIntegerArray
a[0] := 1;
mov eax,[ebp-$04]
mov [eax],$00000001
a[1] := 2;
mov eax,[ebp-$04]
mov [eax+$04],$00000002
Stefan Glienke dang, yes, you're right, it handle the result as an hidden var param... (that "lea edx,[ebp-$04]")
ReplyDelete...and even if you stdcall/cdecl/safecall, the compiler emulates the var param by pushing the address on the stack
ReplyDeletea := NewArray(2);
push $02
lea eax,[ebp-$04]
push eax
call NewArray
add esp,$08
I guess if you don't want to fiddle with pointers you're left with moving the assignment to a different procedure (a local procedure is fine)... but you need enough assignments to offset the call overhead
ReplyDeleteprocedure FillArray(const a : TArray);
begin
a[0] := 1;
a[1] := 2;
end;
interestingly enough if you mark FillArray as "inline", the generated code becomes truly horrible as the compilers throws in an extra temporary variable and a DynArrayAsg... not something you want in an inner loop!
Eric Grange Yeah things can get really worse sometimes when trying to be clever :)
ReplyDeleteLike just now I had a case and the prologue of the routine which was rather short was quite long, because the compiler reserved space for local variables it needed to create for a few case labels. I know its just a few cases but these are the times where I wish for a tool that quickly points out these spots (whoops, could be the compiler itself) and tells me, hey you might want to move that to a local subroutine (I remember your blog post from a few years ago where you explained a case like that).
Stefan Glienke NexusDB "Insider" tool is appropiate for this, analyzes assembly to give info about "too complex" portions (excessive stack references, register pressure etc)
ReplyDeleteStefan Glienke anyway there is a limit in what the compiler does. I remember Allen's comment about re-ordering local vars might produce better assembly, as a hint to the compiler to what should be put in registers instead of stack
ReplyDeleteDavid Berneda Nice, I will take a look. Thanks for the pointer.
ReplyDeleteEdit: Ah, its part of the Quality Suite - I only played with the coverage analyzer so far.
David Berneda IIRC it was the first variables are rather put into the regs?
ReplyDeleteI also remember that someone once told me to watch the order of arguments when one method calls another as that might cause unnecessary moving around the registers. I just cannot find some in-depth explanation of that. Maybe Eric Grange knows :)
I am still having a hard time finding out what is actually useful optimization so one should keep an eye on when writing code in the first place and what falls into the premature optimization category.
Like sometimes I write code that is nice to read, compact, clean. But then you look into the assembly and are scared. Like today with the case statement. In the end I made a const array[TTypeKind] of procedure(...)
The order dependency is "subtile", it will depend on what you do with the parameters and variables, which are in scope at which point, etc. and same for the parameter passing, because there can be implicit parameters which will shift stuff around, or parameters can have been shifted to different temporary registers... so best check in the CPU view :)
ReplyDeleteIf your code is not called repeatedly in Loops (with a capital L, to indicate those with many rounds, or those called from other loops), it probably does not matter, just profile and optimize when/if needed.
If it's for a library, well, it's up to you to decide how likely it is to be called in Loops, and if it is, whether it is likely to be the bottleneck of such Loops or not :)
Also at some point micro-optimizations in high-level library functions no longer make sense.
For instance a highly optimized IntToStr would not benefit a highly optimized "integer array to JSON" function, as the IntToStr function prototype requires allocating a string makes it inefficient in Loops (not just because of the allocation, but also because the string content will need to be copied at some point).
So you would be better off optimizing a lower level IntToPChar function and use it in both IntToStr and IntegerArrayToJSON.
Another subtility, I was comparing the pointermath array codegen with array of Int64, on 32bit cpu its worst than normal a[index] access
ReplyDelete