Just in case it might be useful to anyone else, or you just want to have a look (and hopefully not a laugh), here's my BufferedStreamReader.

Just in case it might be useful to anyone else, or you just want to have a look (and hopefully not a laugh), here's my BufferedStreamReader.

The BufferedStreamReader is modelled after TStreamReader, but also allows direct access to the underlying stream.

The BufferedStreamReader was created to solve a specific requirement I had. I needed to parse some text followed by some binary data from a given TStream. While the TStreamReader does allow you to access the underlying stream, it doesn't give you any way to compensate for the data it has buffered internally while reading the text portion of the stream.

This greatly increases the complexity when trying to read the binary data that follows the text, and is almost impossible if the characters are encoded is something other than ASCII.

The ppm2png example included in the repository shows just such a situation. PPM image files have a ASCII text header, followed by the binary image data. Using the BufferedStreamReader the header can be easily parsed, and reading the subsequent image data is trivial.

The BufferedStreamReader relies on the BufferedStream class to handle the buffering. The BufferedStream class is stand-alone and might be useful on its own.

It should also be significantly faster than the TStreamReader, especially when doing small reads with larger buffer sizes. In the tests I've done it's from 3x to 100x faster.

Anyway, I didn't intend to make this very public but if it's useful to someone else then great :)
https://github.com/lordcrc/BufferedStreamReader

Comments

  1. API is in a wee bit of flux, the standard single delimiter ReadUntil calls are fixed but working on a better solution for the multi-delimiter ones.

    ReplyDelete
  2. Just pushed a change which removes the multi-delimiter ReadUntil method and replaces it with one taking a regular expression.

    Due to limitations in the provided TRegEx from Systems.RegularExpressions I've had to write my own wrapper (see readme for details). This wrapper uses the same PCRE that ships with Delphi though, just accesses the API directly.

    With a studied regular expression, performance is just 60% slower than plain ReadLine when using the expression "\r|\n|\r\n".

    The TStreamReader.ReadLine is still twice as slow as the regular expression ReadUntil, so I think this should be acceptable for most :)

    ReplyDelete
  3. Just for fun I tested the worst-case performance of the regex ReadUntil, where the regex is created for each ReadUntil call. It was of course much slower (about an order of magnitude).

    However the kicker was that always studying resulted in about 50% faster running times compared to not studying, even for very short lines (<40 chars). So while studying can make the creation phase take twice as long, it results in much better performance even with short matches using trivial expressions.

    I can see why in PCRE2 they've removed the study option and just always do it. I've changed my RegularExpr library to do the same.

    ReplyDelete

Post a Comment