Just one doubt. When TWriter.WriteString chooses vaString prefix, is it basically saving the string as a shortstring? If so, there is a massive possible optimization in TReader.ReadString for this vaString case, bypassing TEncoding and just reading the bytes directly on a shortstring.

Comments

  1. If you want to patch it for non NEXTGEN in your local copy fine, read it into an AnsiString and see how much impact this micro optimization has. If you think about a QP report - this will hardly be done by the overlords.

    ReplyDelete
  2. I've done a test without patching, using TReader NextValue = vaString then reading directly to a shortstring. I'm writing arrays of many many strings and doing this trick one test that takes 21seconds gets down to aprox 9.

    ReplyDelete
  3. So as long as there is a way to do it in user code, I'm fine with it. The doubt I have is if its really safe to consider content is always a plain shortstring

    ReplyDelete
  4. vaString is one byte for the length followed by a 8-bit string with the length of the bytes value. That layout is the same as for the ShortString type. Did you check if there is a performance difference between using ShortString and AnsiString? If there is no significant difference the same code can be used for vaString and vaLString except for reading the length at the beginning.

    ReplyDelete
  5. Perfect, I'll test this hopefully tomorrow and post a test/bench project

    ReplyDelete
  6. Test project: https://drive.google.com/open?id=0BymV3q6di65nTm5UQ0JYUllwcWc

    Very similar results with 32bit and 64bit. TReader ReadString = 2.1 seconds, with the trick = 0.7 seconds

    ReplyDelete
  7. I do see a similar improvement - 6.6 seconds vs. 1.9 seconds. With AnsiString it is 2.2 seconds. IDE Fix Pack does not touch TReader.ReadStr and I'll check if there is a noticeable improvement and if yes, it should find it's way into IDE Fix Pack 6.

    ReplyDelete
  8. Unfortunately the UTF8 decoding in ReadStr is necessary. I have had property names with non ASCII characters in mind, but mixed it up with vaIdent / TReader.ReadIdent that has UTF8 decoding. I have done some tests and noticed that the UTF8 decoding is necessary in ReadStr. However even with an UTF8 check ("for I := 1 to L do if Ord(S[I]) > 127 then...") ObjectBinaryToText is 10 percent faster with ShortString. There is no significant difference between with and without the check. Results for ObjectBinaryToText for about 950 DFM files in binary format - original: 2100 ms, ShortString: 1901, ShortString with UTF8 check: 1920.

    ReplyDelete

Post a Comment