Wait... WHAT!?
Wait... WHAT!?
Exercise for the reader - how much faster would this be with a hash table or binary search?
unit Data.Db;
function TFields.FindField(const FieldName: string): TField;
var
I: Integer;
HashValue: Cardinal;
begin
if FList.Count > 0 then
begin
HashValue := TNamedItem.HashName(FieldName);
for I := 0 to FList.Count - 1 do
begin
Result := FList.Items[I];
if (Result.FFieldNameHashValue = HashValue) and
(AnsiCompareText(Result.FFieldName, FieldName) = 0) then
Exit;
end;
end;
Result := nil;
end;
Exercise for the reader - how much faster would this be with a hash table or binary search?
unit Data.Db;
function TFields.FindField(const FieldName: string): TField;
var
I: Integer;
HashValue: Cardinal;
begin
if FList.Count > 0 then
begin
HashValue := TNamedItem.HashName(FieldName);
for I := 0 to FList.Count - 1 do
begin
Result := FList.Items[I];
if (Result.FFieldNameHashValue = HashValue) and
(AnsiCompareText(Result.FFieldName, FieldName) = 0) then
Exit;
end;
end;
Result := nil;
end;
Do you remember what they did in xe2 with normalizing trect ?
ReplyDeleteHow many fields do you have?
ReplyDeleteIf it's dozens or hundreds, hashtable would win, if it's half a dozen, the hashtable would lose (both in terms of performance and code complexity).
Naive binary search would likely lose to both that snippet and a hastable (because AnsiCompareText is not cheap), but a binary search on the hash might work better than both if you have no more than dozens of fields.
If you have dozens or hundreds of fields, I suspect this method is going to be more canary than problem ;)
Also a bigger problem is if you're calling that code multiple times per record (in which case you deserve the crappy performance), rather than once per field.
Lars, I tend to disagree. This is a list that is not sorted. Sure you could have a sorted and hashed fields dictionary used to speed up searches, in case you need to search often (hint: use persistent fields).
ReplyDeleteOh, you can do that, just with a little code. And why it's not in each and every TDataSet by default? Because in many cases it is not used, would imply an overhead in each and every application.
I know this might sound as taking excuses, but a library has design principles in terms of being general purpose and balancing speed/memory/time for ALL users. Adding a hash for the field name for faster searches (like shown above) still costs extra to some, but probably is a good trade-off. Pushing FindField to the max might not suit every user as well... and adding that feature when needed is really at reasonably easy reach.
I'm not pointing this out for the specific example, but because while we have don and probably still do mistakes that we need to correct, there are scenarios in which there are design decisions that are not in the best interest of some of our users, but they tend to balance different usage scenarios. And you know developers use Delphi for the most diverse applications out there. Good enough you have the power to see (full source code) and to add your own "improvements".
I've witnessed other cases in which a speed up for a customer would result in a slow down for another, and even a complete crash!
This is not hash table. This is just a list with hash values. Code shows that the developers tried to reduce the number of AnsiCompareText calls.
ReplyDeleteSee below for my current quick fix which creates the indexed lookup the first time FieldByName function is called. Note that I did this in a wrapper and not by modifying the code.
ReplyDeleteThe 220+ views have from a few to 70+ columns.
This significantly speeded up reading records and accessing by named field instead of by order - which IMO is the future-proof way of reading row values (i.e. no sequence dependencies).
If there is a better way to set property values from a query, than
Obj.Prop1 := DataSet.FieldByName['PropField1'].AsPropertyType;
Obj.Prop2 := DataSet.FieldByName['PropField2'].AsPropertyType;
...
I would be willing to try it out.
function TPSDRecordset_FD_Abstract.FieldByName(const FieldName:String):TField;
var
F : TField;
ix : Integer;
begin
if not Assigned(IndexedFieldList)
then begin
IndexedFieldList := TFieldIndexList.Create;
IndexedFieldList.Sorted := False;
for F in DataSet.Fields
do IndexedFieldList.AddObject(LowerCase(F.FieldName), TObject(F));
IndexedFieldList.Sorted := True;
end;
if IndexedFieldList.Find(LowerCase(FieldName), ix)
then Result := IndexedFieldList.Items[ix]
else raise EPSDDbFieldNotFound.Create('Field "'+FieldName+'" not found in record set!');
end;
Furthermore...
ReplyDeleteThere are two more things I wonder about in this code.
1. Why create a hash, but still compare both the hash and the field name?
2. MSSQL supports unicode table and column names.
CREATE TABLE អតិថិជន
(
ឈ្មោះអ្នកខ្ចី INT
)
Is this supported? (Thinking about that AnsiCompareText)
Two strings might have the same hash. The Ansi I'm not convinced either.
ReplyDelete1. Hash function for a completely different strings can return the same value.
ReplyDelete2. Usually when accessing such fields:
- names of the fields are enclosed in quotation marks (in SQL-query);
- names of the fields are case sensitive.
Eric Grange - For most reads, we use the simple way of just passing the record set to the object instance and let it pick out the fields from the current row. If it is necessary to read a large number of rows, we have routines that indeed do only fetch the field once.
ReplyDeleteIs this an ideal candidate for TDictionary?
ReplyDeleteSteve Maughan, I just tested that on the same dataset. It added about 250ms to the load time. Plain generic TDictionary is not a speed demon.
ReplyDeleteSteve Maughan Lars Fosdal and TDictionary is case sensitive in that case no? so it actually had an easier job
ReplyDeleteНиколай Зверев Case sensitivity can be turned off, and tends to be off by default - at least for MSSQL, MySQL, and Oracle. We are about to move to PostgreSQL, where that is NOT the case - so that will get interesting.
ReplyDeleteMarco Cantù The Ansi in the name is misleading. It actually calls the CompareStringW function in kernel32.
ReplyDeleteUwe Raabe Good catch, I should have known...
ReplyDeleteUwe Raabe - That's one less worry. Thanks!
ReplyDeleteEric Grange - I lower-cased the field names on TDictionary.Add and TryGetValue.
ReplyDeleteAny other alternative you could recommend that would beat a sorted TStringList with a custom compare string (SysUtils.CompareStr)?
Lars Fosdal since you're creating a HashValue for the field name anyway, have you tried to use this as a custom hash for the TDictionary?
ReplyDeleteAlso, how many fields are there? I would have thought TDictionary would have been faster once the number of fields goes up.
Steve Maughan - I don't make a hash, I just use the lower case field name. Field counts: 75, 74, 73, 59, 59, 55, 54, 54, 50, 49, 46, 46, 45, 45, 43, 43, 41, 40, 39, 39, 39, 38, 38, 37, 36, 36, 36, 35, 35, 34, 32, 32, 32, 31, 31, 30, 29, 29, 29, 27, 27, 27, 27, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1
ReplyDeleteLars Fosdal I'm amazed TDictionary is slower since it only costs the hashing of the FieldName and rest is a direct look-up.
ReplyDeleteWell, given how pervasive the Generics Collections have become, it makes you wonder if they should do more to ensure that they are super fast.
ReplyDeleteF.x. Why do we call BobJenkinsHash, when it's just a wrapper for HashLittle?
In one complex form we have I found that over half the time spent during its 15+ second loading time is due to FindField. Most of it was actually spent hashing the strings. If you look at the HashName function you should begin to cry (unless they've sorted it out since XE3).
ReplyDeleteThe brute force simple binary search cuts the time quite a bit, but I think I'll a proper hash list might speed it up further.
ReplyDeleteIn a loop, always use a local variable with each field.
ReplyDeleteIt will be faster and more readable.
The whole db.pas unit is far from optimized. For instance, TDataSet is a real bottleneck.
A. Bouchez - If you need to read many rows, that really helps - but if you have many relations, and a large set of data - you still will need to look up a lot of fields, and if each of those are a linear enumeration through the list of fields - it really is seriously slow.
ReplyDeleteTo make an efficient lookup mechanism, there are already several constraints that helps. We know there will not be duplicate field names. We know the number of fields will be fairly moderate, and we know the list will constant once built, for the lifetime of the retrieval. We could - in theory - cache the list between uses - if we make the assumption that the database schema doesn't change between uses.
You might want to try Spring4D generic stuff and compare if it is faster than the built in stuff. It is better designed, but I'm not sure if it is faster.
ReplyDeleteEric Grange Indeed. When profiling the slow loading form, I found like 70% of the time was spent in string allocation and Move. The first due to AnsiUpperCase and then the Move you mention.
ReplyDeleteThe reason FieldByName was called so much during startup on this form was due to our DevExpress grids, which apparently uses it to populate the grids, along the lines of
col.Value := FieldByName(col.DataBinding.FieldName).Value;
One of the grids contained rather a lot of records.
Unfortunately our "main" table is rather denormalized (almost 2^9 columns, doh!), so in our case the hashing can still pay off, but I'm certain a FastCode-style rewrite of that hash function would do wonders.
Jeroen Wiert Pluimers FWIW the hash table and hash set currently in Spring4D are just wrappers around Generics.Collections.TDictionary. TDictionary is fast on a good day, but requires a good hash function as it's very prone to clustering, in which case you're back to O(n).
ReplyDeleteAsbjørn Heid thanks for clarifying that.
ReplyDelete----
ReplyDeleteIf there is a better way to set property values from a query, than
Obj.Prop1 := DataSet.FieldByName['PropField1'].AsPropertyType;
Obj.Prop2 := DataSet.FieldByName['PropField2'].AsPropertyType;
...
I would be willing to try it out.
-----
Is this code in a loop? Or even just executed once, but executed again and again as some current rec somewhere else changes?
Don't want to use persistent fields at design time because they tie you down to much?
Then set up "semi-persistent" fields: declare an instance member for every field you use, populate them using FieldByName once when you open the dataset and use the instance members in the rest of your code.
But I am sure you know about that? So what am I missing here?
Marjan Venema - No offence, but you should have read all the previous comments.
ReplyDeleteDid anyone create his own dataset and added a new implementation of FindField? Would be interesting if the speed gains are mainly in benchmarks or if there is a noticable difference in real world applications.
ReplyDeleteLars Fosdal Non taken. I did skip over them a bit thinking they were mostly about hash tables/functions etc.
ReplyDelete