TStreamReader turns out to be appallingly poorly implemented. This code
TStreamReader turns out to be appallingly poorly implemented. This code
{$APPTYPE CONSOLE}
uses
System.SysUtils,
System.Classes,
System.Math,
System.Diagnostics;
procedure ReadFile(const FileName: string; const BufferSize: Integer);
var
Reader: TStreamReader;
MaxLineLength: Integer;
Stopwatch: TStopwatch;
begin
Stopwatch := TStopwatch.StartNew;
Reader := TStreamReader.Create(FileName, TEncoding.UTF8, False, BufferSize);
try
MaxLineLength := 0;
while not Reader.EndOfStream do begin
MaxLineLength := Max(MaxLineLength, Reader.ReadLine.Length);
end;
finally
Reader.Free;
end;
Writeln(BufferSize, ' ', MaxLineLength, ' ', Stopwatch.ElapsedMilliseconds);
end;
const
FileName = 'C:\desktop\108_Lifebuoy_Light.obj';
var
i: Integer;
BufferSize: Cardinal;
begin
BufferSize := 32;
for i := 1 to 16 do begin
ReadFile('C:\desktop\108_Lifebuoy_Light.obj', BufferSize);
BufferSize := BufferSize*2;
end;
Writeln('Finished');
Readln;
end.
produces this output
32 46 977
64 46 987
128 46 982
256 46 796
512 46 747
1024 46 740
2048 46 853
4096 46 1133
8192 46 1738
16384 46 3039
32768 46 5873
65536 46 10794
131072 46 20654
262144 46 80834
Well, I have up waiting for the final two entries. The input file is large. It's a 16MB Wavefront obj file. I'm writing an obj parser at the moment. It rather beggars belief that increasing the buffer size results in worse performance.
A simple string list gets the job done in about 220ms on my test machine:
procedure ReadFile(const FileName: string);
var
Lines: TStringList;
Line: string;
MaxLineLength: Integer;
Stopwatch: TStopwatch;
begin
Stopwatch := TStopwatch.StartNew;
Lines := TStringList.Create;
try
Lines.LoadFromFile(FileName, TEncoding.UTF8);
MaxLineLength := 0;
for Line in Lines do begin
MaxLineLength := Max(MaxLineLength, Line.Length);
end;
finally
Lines.Free;
end;
Writeln(MaxLineLength, ' ', Stopwatch.ElapsedMilliseconds);
end;
A noddy Python program (for line in fileObject) was faster than TStreamReader!
I was almost in tears when I read through the source code for TStreamReader.
Anyway, does anybody know of an efficient and effective way to read through a file line by line, and support the standard Unicode encodings.
I'm not asking on SO because I'm looking for a library recommendation.
{$APPTYPE CONSOLE}
uses
System.SysUtils,
System.Classes,
System.Math,
System.Diagnostics;
procedure ReadFile(const FileName: string; const BufferSize: Integer);
var
Reader: TStreamReader;
MaxLineLength: Integer;
Stopwatch: TStopwatch;
begin
Stopwatch := TStopwatch.StartNew;
Reader := TStreamReader.Create(FileName, TEncoding.UTF8, False, BufferSize);
try
MaxLineLength := 0;
while not Reader.EndOfStream do begin
MaxLineLength := Max(MaxLineLength, Reader.ReadLine.Length);
end;
finally
Reader.Free;
end;
Writeln(BufferSize, ' ', MaxLineLength, ' ', Stopwatch.ElapsedMilliseconds);
end;
const
FileName = 'C:\desktop\108_Lifebuoy_Light.obj';
var
i: Integer;
BufferSize: Cardinal;
begin
BufferSize := 32;
for i := 1 to 16 do begin
ReadFile('C:\desktop\108_Lifebuoy_Light.obj', BufferSize);
BufferSize := BufferSize*2;
end;
Writeln('Finished');
Readln;
end.
produces this output
32 46 977
64 46 987
128 46 982
256 46 796
512 46 747
1024 46 740
2048 46 853
4096 46 1133
8192 46 1738
16384 46 3039
32768 46 5873
65536 46 10794
131072 46 20654
262144 46 80834
Well, I have up waiting for the final two entries. The input file is large. It's a 16MB Wavefront obj file. I'm writing an obj parser at the moment. It rather beggars belief that increasing the buffer size results in worse performance.
A simple string list gets the job done in about 220ms on my test machine:
procedure ReadFile(const FileName: string);
var
Lines: TStringList;
Line: string;
MaxLineLength: Integer;
Stopwatch: TStopwatch;
begin
Stopwatch := TStopwatch.StartNew;
Lines := TStringList.Create;
try
Lines.LoadFromFile(FileName, TEncoding.UTF8);
MaxLineLength := 0;
for Line in Lines do begin
MaxLineLength := Max(MaxLineLength, Line.Length);
end;
finally
Lines.Free;
end;
Writeln(MaxLineLength, ' ', Stopwatch.ElapsedMilliseconds);
end;
A noddy Python program (for line in fileObject) was faster than TStreamReader!
I was almost in tears when I read through the source code for TStreamReader.
Anyway, does anybody know of an efficient and effective way to read through a file line by line, and support the standard Unicode encodings.
I'm not asking on SO because I'm looking for a library recommendation.
Comments
Post a Comment