Trick question: what is wrong with that code? (apart from obviously being useless) - and it compiles and runs. Also not talking about some syntax preferences.

Trick question: what is wrong with that code? (apart from obviously being useless) - and it compiles and runs. Also not talking about some syntax preferences.

procedure Main;
var
arr: array of string;
i: Integer;
begin
SetLength(arr, 100000);
for i := 0 to High(arr) do
arr[i] := 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.';
end;

Comments

  1. String literal too long? Using 0 instead of Low(arr)? ;-)

    ReplyDelete
  2. Is there a difference if the string is assigned to the first array element and then the loop copies from this element? I would guess that will avoid 99999 allocations.

    ReplyDelete
  3. When I copy-paste this into Delphi and try to build, it refuses to build.  Looking into it with a hex editor, it appears that there's somehow a UTF-8 BOM at the end of the code snippet.

    Did you do something sneaky to embed one in there, or is my browser just being weird?

    ReplyDelete
  4. It allocates 100000 strings rather than just one string with a ref count of 100000?

    ReplyDelete
  5. Ah, looks like Leif and Eric have it.  When you assign a string literal or a constant to a variable, it makes a copy of it.  When you assign a string variable to a variable, it increments the ref count instead.

    This isn't "wrong"; it's for memory safety.  Constants need to be constant, and if your string variable had the same memory address as a constant in code, you could get a PChar to it and change the data stored within, which would either cause unexpected side effects if the compiler is written badly, or cause an access violation if the compiler caused that constant string to be properly output to a read-only page of memory.

    ReplyDelete
  6. Mason Wheeler That is just a G+ issue when copying code.
    Leif Uneus  Eric Grange You are spot-on :) Every time it assigns the literal to an array element it internally creates a new string because the source was a const.

    ReplyDelete
  7. Eric Grange Yes, I just tried: for I := Low(arr) to High(arr) do Writeln(NativeUInt(arr[I])); and it prints different values!

    ReplyDelete
  8. Well, one could argue here whether the code is wrong or your expectations are wrong. I definitely don't see anything that I would classify as "wrong" in this behaviour.

    ReplyDelete
  9. Mason Wheeler constants being constants went down the drain when "records with methods" were introduced, these can modify constants without any warning or error from the compiler.

    ReplyDelete
  10. Stefan Glienke encountered that one a while ago in various other guises...

    I nowadays keep a string de-duplication routines around, it helps with variants of your snippets, but also big xml/json in which node names or attributes can be repeated zillions of times in a DOM.

    Besides the memory usage consideration, the RTL can also shortcut string equality tests when two string variables point to the same string instances.

    ReplyDelete
  11. What's funny though is that if you introduce a local string variable and assign the literal to that variable and then assign that to the array elements within the loop you have the same problem because the compile in that case is smart enough to see that you are not writing to that local variable and just assigns that const string to it

    ReplyDelete
  12. Eric Grange Yes, I remember your article about string interning. I wish that was built into the RTL.

    ReplyDelete
  13. Stefan Glienke However, if you assign the literal to arr[0], and then assign arr[0] to all other indices, you only end up with one copy.

    ReplyDelete
  14. Eric Grange Do you have any examples of this?  Also, are you sure you're not talking about "typed constants", which are actually globals because reasons?

    ReplyDelete
  15. Stefan Glienke​ Wow. The compiler is "outsmarting" itself. ;-)

    ReplyDelete
  16. Mason Wheeler Even using a typed const would have caused this issue here.
    As for you claim about safety. There is nothing safe about strings when you access them via pointer. You can even manipulate consts then.
    The original code however was not an array but using a const string as flag field (Spring.Nullable) where I noticed that it creates a new string every time I assign that const to a field in the nullable.
    Not that it terribly surprised me but imo this is not something you think about all the time as you can see in much code that deals with strings not taking care about the non existing string interning.

    ReplyDelete
  17. Mason Wheeler He means that you can call a method that mutates the subject state on a record that was passed to a const argument.

    Pretty much means that you must never write methods of value types that mutate the subject state. If you follow that rule it's fine.

    For example, instead of

    procedure TMyRecord.Initialise(args);
    begin
    FMyField := ...;
    end;

    you would write

    class function TMyRecord.New(args): TMyRecord;
    begin
    Result.FMyField := ...;
    end;

    The class function would be static.

    Never modify a value type in an instance method of that value type. Always use a static class function that returns a new instance of the value type.

    ReplyDelete
  18. David Heffernan Unfortunately that kills performance as soon as your record contains a managed field because missing RVO will cause _CopyRecord all over the place :(

    ReplyDelete
  19. Stefan Glienke How would you go about getting an array full of references to a single constant string?

    ReplyDelete
  20. Mason Wheeler records with methods can mutate both typed constants and const parameters, I do not think there exists another type of const for records?

    Typed constants only need to be globals if you have the "assignable typed constants" option set, which has been legacy for decades now IIRC.

    ReplyDelete
  21. David Heffernan Stefan Glienke refcounting a string constant poses no theoretical problem, the problem is because string constants are stored with a different format in the binary than what String uses.

    You could simply have the String constant use the same format, with the refcount field initialized, so it is never freed, and is protected by COW mechanisms. No need to involve immutable strings.

    ReplyDelete
  22. Eric Grange That's my understanding and I always thought that was how constant strings were already implemented (same layout as all other strings, ref count -1). Disappointing to be disabused of my misunderstanding.

    Store constant strings with same layout as other strings, but ref count -1. When such a thing is encountered on rhs of assignment, it's a simple pointer copy. If that variable is subsequently modified, then COW applies. Simple. Or have we missed something.

    ReplyDelete
  23. Stefan Glienke But mutating structured value types also has its problems, so choose your poison.

    RVO. 25 years old, but still we don't have it. Sigh.

    ReplyDelete
  24. David Heffernan There are some subtleties with where you place the constants, so they can be modified, rather than be in read-only code memory, this could be done with page/segment tweaking, or by simply copying them to memory at startup (only once, and all at once).

    But nothing fundamental, since you have the same mutability issue as with typed constants, the same code in the compiler could probably be reused.

    ReplyDelete
  25. Eric Grange Certainly for literals they don't need to be modified, so they could be stored in the same place and remain read only. And the compiler knows for typed constants whether or not they are assignable and so they could be stored as are literals in the normal case of non-assignable.

    ReplyDelete
  26. The copy is done for package compatibility: if the string constant is assigned from a package, and the package is unloaded, then a GPF will occur. So in the RTL there are two low-level functions: UStrAsg and UStrLAsg.
    From a comment in the source code: when assigning one long string to another the compiler generates code to call
    either System._LStrAsg or System._LStrLAsg, depending on whether the
    destination is a local or global variable.

    ReplyDelete
  27. A. Bouchez​ what about strings in a typed constant ? If those are compatible with packages the same mechanism could be used.

    (anyway packages in Delphi probably have outlived their usefulness, some more isolated package logic would be preferable, if only so that a bug in a package does not bring down the whole IDE...)

    ReplyDelete
  28. Randy Sill which Delphi version?

    At least in XE, resourcestrings are duplicated and you also take a massive performance hit when accessing one.

    ReplyDelete
  29. Randy Sill weird, each resourcestring access is basically compiled to a function call by the compiler, which allocates a new string each time (because being a resourcestring, the actual string can have changed between calls if the language was changed).
    If you see a bigger bloat with const, there must be something else going on, like maybe you are using WideString? In pre-Unicode Delphi versions those could bloat significantly because they did not benefit from any kind of optimization, and they were just slow all around.

    ReplyDelete

Post a Comment