TStreamReader turns out to be appallingly poorly implemented. This code

TStreamReader turns out to be appallingly poorly implemented.  This code

{$APPTYPE CONSOLE}

uses
  System.SysUtils,
  System.Classes,
  System.Math,
  System.Diagnostics;

procedure ReadFile(const FileName: string; const BufferSize: Integer);
var
  Reader: TStreamReader;
  MaxLineLength: Integer;
  Stopwatch: TStopwatch;
begin
  Stopwatch := TStopwatch.StartNew;
  Reader := TStreamReader.Create(FileName, TEncoding.UTF8, False, BufferSize);
  try
    MaxLineLength := 0;
    while not Reader.EndOfStream do begin
      MaxLineLength := Max(MaxLineLength, Reader.ReadLine.Length);
    end;
  finally
    Reader.Free;
  end;
  Writeln(BufferSize, ' ', MaxLineLength, ' ', Stopwatch.ElapsedMilliseconds);
end;

const
  FileName = 'C:\desktop\108_Lifebuoy_Light.obj';

var
  i: Integer;
  BufferSize: Cardinal;

begin
  BufferSize := 32;
  for i := 1 to 16 do begin
    ReadFile('C:\desktop\108_Lifebuoy_Light.obj', BufferSize);
    BufferSize := BufferSize*2;
  end;
  Writeln('Finished');
  Readln;
end.

produces this output

32 46 977
64 46 987
128 46 982
256 46 796
512 46 747
1024 46 740
2048 46 853
4096 46 1133
8192 46 1738
16384 46 3039
32768 46 5873
65536 46 10794
131072 46 20654
262144 46 80834

Well, I have up waiting for the final two entries.  The input file is large.  It's a 16MB Wavefront obj file. I'm writing an obj parser at the moment.  It rather beggars belief that increasing the buffer size results in worse performance.

A simple string list gets the job done in about 220ms on my test machine:

procedure ReadFile(const FileName: string);
var
  Lines: TStringList;
  Line: string;
  MaxLineLength: Integer;
  Stopwatch: TStopwatch;
begin
  Stopwatch := TStopwatch.StartNew;
  Lines := TStringList.Create;
  try
    Lines.LoadFromFile(FileName, TEncoding.UTF8);
    MaxLineLength := 0;
    for Line in Lines do begin
      MaxLineLength := Max(MaxLineLength, Line.Length);
    end;
  finally
    Lines.Free;
  end;
  Writeln(MaxLineLength, ' ', Stopwatch.ElapsedMilliseconds);
end;

A noddy Python program (for line in fileObject) was faster than TStreamReader!

I was almost in tears when I read through the source code for TStreamReader.

Anyway, does anybody know of an efficient and effective way to read through a file line by line, and support the standard Unicode encodings.

I'm not asking on SO because I'm looking for a library recommendation.

Comments