How often are you going to have more than two billion items in a dictionary, though? I suspect by the time you approach that, you will have been forced to write your own container for a number of other reasons too.
On the other hand, I do wonder if NativeInt would be more natural. On the other other hand, keeping it Integer results in the same compiled behaviour for 32 and 64-bit.
Nándor Kiss You'd probably run into similar limits in C++ or C# too... How many items do you have? (Are you using Cardinal in order to have up to 4 billion?)
Asbjørn Heid Wouldn't the equivalent in Delphi be NativeInt, NativeUInt, NativePtr?
size_t is sane, yes, but is it really necessary to have a dictionary that supports more than 2 billion items? No-one's directly answered that question yet.
I'd understand if it was a 16-bit to 32-bit upgrade. But 32 to 64 doesn't bring the same advantages here.
David Millington Ignoring pedantic differences, yes, using NativeInt/NativeUInt would work. The main difference I was thinking of is that in C++ they actually use size_t in the libraries etc...
As for "is it really necessary"... Some applications have a natural tendency to be pushed in terms of content. On the C++ renderer I worked on before I got back to Delphi, we had regular complains from users who ran into issues when loading highly detailed scenes taking multiple GB, or trying to render 32000x32000 pixel images etc. They had >=16GB RAM and naturally expected to be able to use all of it.
We used many standard library types, for example during processing of the resulting image several steps would use a std::vector (C++'s version of TList) to hold some value for each pixel in the image. We never had any issues with those, they just worked. The issues were mostly where we assumed we could hold a copy of some datastructure in memory, or similar stuff in our own code.
edit: when I say regular I don't mean frequent. But every few months someone would push our renderer past a different edge.
Asbjørn Heid I understand the issues. Until last year I used to work on a resource-stressing C++ app too (sometimes multi-gigabyte data sets. One was recorded at 2GB / minute, I think. Not always easy in what was then a 32-bit process.)
I guess I just think a Delphi dictionary will be struggling by the time it approaches 2 billion items - its hashing, its lookup, everything. So by the time you get anywhere near that, you'd want a different data structure.
David Millington I don't see why TDictionary would struggle significantly on 64bit (had it used NativeInts as appropriate) with >2 billion items.
The biggest issue would be growing to >2 billion items, which involve a reallocation and rehashing. But if you know how many items you will be storing in the dictionary and create it with the appropriate capacity, then that is not an issue. And for many "lookup-table" use cases you do know the required capacity.
David Heffernan Good test (although it's only 10 million - needs an extra digit) and I think shows I'm probably wrong :) Asbjørn Heid Yep. Good point. I don't know how its hash table allocates and what amount of collisions etc might occur at that size, do you?
There is an easy solution to this. Let's file a QP asking the containers be changed to use NativeInt in 10.1.
David Millington The TDictionary is implemented using open addressing with linear probing, all elements are stored in a TArray<> effectively. Collision probability would not be affected as long as the array can grow to keep the load factor below 0.75.
The biggest hurdle to make it "64bit ready" is that IEqualityComparer will have to be changed to use NativeInt, and that will likely break a lot of code, both RTL and and user code. IIRC this is Delphi.Net legacy, as .Net also has piss-poor 64bit support.
There is no Capacity property on TDictionary but the Count property is Integer, as is the HashCode. Hopefully that answers your question...
ReplyDeleteNicholas Ring He did not ask for a property but for the capacity parameter of the constructor.
ReplyDeleteNándor Kiss Why don't you check for yourself? http://docwiki.embarcadero.com/Libraries/Seattle/en/System.Generics.Collections.TDictionary.Create
Stefan Glienke My bad - misread the question :( Sorry Nándor Kiss
ReplyDeleteStefan Glienke
ReplyDeleteThanks, I forget it... :D
Nicholas Ring
No problem :)
btw:
It doesn't make sense to me to use Capacity as an Integer in x64.
It can be grow and grow until... maybe until High(UInt64)? :)
How often are you going to have more than two billion items in a dictionary, though? I suspect by the time you approach that, you will have been forced to write your own container for a number of other reasons too.
ReplyDeleteOn the other hand, I do wonder if NativeInt would be more natural. On the other other hand, keeping it Integer results in the same compiled behaviour for 32 and 64-bit.
David Millington
ReplyDeleteIt's an experimantal project, storing values of this type:
, Cardinal>
It's like kind of a lookup table, but now I know my (Delphi's) limits :)
Nándor Kiss You'd probably run into similar limits in C++ or C# too... How many items do you have? (Are you using Cardinal in order to have up to 4 billion?)
ReplyDeleteDavid Millington Not in C++, they have sane types defined for these situations (ie size_t).
ReplyDeleteA 4+1 byte value pair ... I guess all possible combinations should fit in a dictionary (Integer = 8 Bytes)
ReplyDeleteOliver Münzberg IIRC Integer in Delphi is always 4 Byte(32Bit) and NativeInt depends on Platform
ReplyDeleteAlexander Benikowski Yeah, I got lost in data types :o)
ReplyDeleteAsbjørn Heid Wouldn't the equivalent in Delphi be NativeInt, NativeUInt, NativePtr?
ReplyDeletesize_t is sane, yes, but is it really necessary to have a dictionary that supports more than 2 billion items? No-one's directly answered that question yet.
I'd understand if it was a 16-bit to 32-bit upgrade. But 32 to 64 doesn't bring the same advantages here.
David Millington Ignoring pedantic differences, yes, using NativeInt/NativeUInt would work. The main difference I was thinking of is that in C++ they actually use size_t in the libraries etc...
ReplyDeleteAs for "is it really necessary"... Some applications have a natural tendency to be pushed in terms of content. On the C++ renderer I worked on before I got back to Delphi, we had regular complains from users who ran into issues when loading highly detailed scenes taking multiple GB, or trying to render 32000x32000 pixel images etc. They had >=16GB RAM and naturally expected to be able to use all of it.
We used many standard library types, for example during processing of the resulting image several steps would use a std::vector (C++'s version of TList) to hold some value for each pixel in the image. We never had any issues with those, they just worked. The issues were mostly where we assumed we could hold a copy of some datastructure in memory, or similar stuff in our own code.
edit: when I say regular I don't mean frequent. But every few months someone would push our renderer past a different edge.
Asbjørn Heid I understand the issues. Until last year I used to work on a resource-stressing C++ app too (sometimes multi-gigabyte data sets. One was recorded at 2GB / minute, I think. Not always easy in what was then a 32-bit process.)
ReplyDeleteI guess I just think a Delphi dictionary will be struggling by the time it approaches 2 billion items - its hashing, its lookup, everything. So by the time you get anywhere near that, you'd want a different data structure.
David Millington I'm not so sure that is the case. This program seems to scale reasonably well. Obviously it needs to be run in a 64 bit process.
ReplyDelete{$APPTYPE CONSOLE}
uses
System.SysUtils,
System.Diagnostics,
System.Generics.Collections;
const
N = 10000000;
function RandomKey: Int64;
begin
Int64Rec(Result).Lo := Random(high(Integer));
Int64Rec(Result).Hi := Random(high(Integer));
end;
procedure Main;
var
Dict: TDictionary;
begin
Dict := TDictionary.Create(N);
while Dict.Count < N do begin
Dict.AddOrSetValue(RandomKey, 0);
end;
end;
var
Stopwatch: TStopwatch;
begin
try
RandSeed := 666;
Stopwatch := TStopwatch.StartNew;
Main;
Writeln(Stopwatch.ElapsedMilliseconds);
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
Readln;
end.
David Millington I don't see why TDictionary would struggle significantly on 64bit (had it used NativeInts as appropriate) with >2 billion items.
ReplyDeleteThe biggest issue would be growing to >2 billion items, which involve a reallocation and rehashing. But if you know how many items you will be storing in the dictionary and create it with the appropriate capacity, then that is not an issue. And for many "lookup-table" use cases you do know the required capacity.
David Heffernan Good test (although it's only 10 million - needs an extra digit) and I think shows I'm probably wrong :)
ReplyDeleteAsbjørn Heid Yep. Good point. I don't know how its hash table allocates and what amount of collisions etc might occur at that size, do you?
There is an easy solution to this. Let's file a QP asking the containers be changed to use NativeInt in 10.1.
David Millington The TDictionary is implemented using open addressing with linear probing, all elements are stored in a TArray<> effectively. Collision probability would not be affected as long as the array can grow to keep the load factor below 0.75.
ReplyDeleteThe biggest hurdle to make it "64bit ready" is that IEqualityComparer will have to be changed to use NativeInt, and that will likely break a lot of code, both RTL and and user code. IIRC this is Delphi.Net legacy, as .Net also has piss-poor 64bit support.