There is no equivalent function to the C/C++ sscanf() function in the Delphi RTL.

There is no equivalent function to the C/C++ sscanf() function in the Delphi RTL.
It can be described as the mirror function of System.Format().

For fun I made an implementation a while ago. It compares well in speed with ms swscanf() in some tests made.

Examples:
n := UnFormat('-12 127 Hi','%d %u %s',[anInt,aWord,aString]);
n := UnFormat( '"Hi, my name is:","Delphi"','"%s","%s"',[s1,s2]);
n := UnFormat( 'Hi, my name is:|Delphi','%s|%s',[s1,s2]);

// Combining time with other parameters requires ending delimiter
n := UnFormat('<2016-02-14 00:02:30> 5','<%t> %d',[dt,k]);

Would there be any interest to put this code into a public GIT repository?

The interface of the unit looks like this:

unit uScanData;

interface

Uses
SysUtils;

Type
// ScanData description:
// class function UnFormat(
// const Input, Format : String;
// const Args : array of ScanData) : Integer;
// class function UnFormat(
// const Input, Format : String;
// const Args : array of ScanData;
// const aFormatSetting: TFormatSettings) : Integer;

// ScanData.UnFormat() decodes an input string with the help of a format string
// into values and puts those into a supplied array of variables.
// It is the mirror function of SysUtils.Format, hence the name UnFormat.
// In c/c++, this function is called "sscanf" or "swscanf".
// The output data covers many basic Delphi types.

// Result
// n : Number of arguments filled, stops at first conversion error.
// -1: Format specifier not in list.
// -2: Mismatch between Input and Format.

// Examples:
// n := ScanData.UnFormat('-12 127 Hi','%d %u %s',[anInt,aWord,aString]);
// => n = 3; anInt = -12; aWord = 127; aString = 'Hi';
// n := ScanData.UnFormat( '"Hi, my name is:","Delphi"','"%s","%s"',[s1,s2]);
// => n = 2; s1 = 'Hi, my name is:'; s2 = 'Delphi';
// n := ScanData.UnFormat( 'Hi, my name is:|Delphi','%s|%s',[s1,s2]);
// => n = 2; s1 = 'Hi, my name is:'; s2 = 'Delphi';
// Combining time with other parameters requires ending delimiter
// n := UnFormat('<2016-02-14 00:02:30> 5','<%t> %d',[dt,k]);
// => n = 2; dt = TDateTime(2016-02-14 00:02:30); k = 5;

// A format token that does not fit the passed parameter type is not evaluated.
// An input that does not qualify for conversion is also left unevaluated.
// The UnFormat function stops at first item that cannot be evaluated.
// This is reflected by the return value of the UnFormat function.

// A valid format specifier looks like this:
// %[*]specifier

// Valid format specifiers (tokens), must be preceded by '%':
// 'd' : A signed integer: ShortInt,SmallInt,Integer,LongInt,Int32,Int64,NativeInt.
// Also accepts hex notation with a preamble $,x,X,0x,0X.
// 'u' : An unsigned integer: Byte,Word,Cardinal,UInt32,UInt64,NativeUInt.
// Also accepts hex notation with a preamble $,x,X,0x,0X.
// 'x' : A signed or unsigned integer, expressed in hex notation,
// with or without a hex preamble. Accepts all numeric variable types.
// 'o' : A signed or unsigned integer, expressed in octal notation,
// with or without an octal preamble ('o','O').
// Accepts all numeric variable types.
// 'p' : A pointer value: hex notation with or without a hex preamble.
// 32 bit or 64 bit unsigned value depending on platform.
// 'e',
// 'f',
// 'g' : Floating point value: Single,Double,Extended,TDateTime.
// The TFormatSettings variable is used to interpret the string value.
// 'm' : Currency value.
// The TFormatSettings variable is used to interpret the string value.
// 't' : TDateTime value: the string contains the date and/or time
// described by the TFormatSettings date/time fields.
// Ending delimiter(s) can be specified in the format string.
// 's' : String value: String,WideString,AnsiString,ShortString.
// Ending delimiter(s) can be specified in the format string.
// 'c' : Character value: Char or AnsiChar.
// '%' : Two '%' Format characters in a row, matches a single Input '%'
// character.

// Optionally, the specifier can have a sub-specifier, *.
// This will skip putting the result of this input into the result argument
// array.

// An overloaded UnFormat method takes a TFormatSettings variable as input
// for conversion where possible.
// Otherwise the default SysUtils.FormatSettings is used.

// Technical background:
// The output data types and references are captured with implicit class
// operators.

// For convenience, standalone wrapper overload functions are defineded:
// function UnFormat(
// const Input, Format : String;
// const Args : array of ScanData) : Integer; overload;
// function UnFormat(
// const Input, Format : String;
// const Args : array of ScanData;
// const aFormatSetting: TFormatSettings) : Integer; overload;

// Note:
// Differences with swscanf:
// * UnFormat has not got all options, like specifying length of input.
// * Integer overflow makes UnFormat stop and disregard the value.
// Input should be regarded as any edit input with full validation to type.
// Hence all local TryStrToXXX implementations which handles this much better
// than similar RTL functions.

ScanData = record
public
class function UnFormat(
const Input, Format : String;
const Args : array of ScanData) : Integer; overload; static;
class function UnFormat(
const Input, Format : String;
const Args : array of ScanData;
const aFormatSetting: TFormatSettings) : Integer; overload; static;
public
// Implicitly init the ScanData record
class operator implicit(var x: Byte): ScanData; inline;
class operator implicit(var x: Word): ScanData; inline;
class operator implicit(var x: Cardinal): ScanData; inline;
class operator implicit(var x: UInt64): ScanData; inline;
class operator implicit(var x: NativeUInt): ScanData; inline;

class operator implicit(var x: ShortInt): ScanData; inline;
class operator implicit(var x: SmallInt): ScanData; inline;
class operator implicit(var x: Integer): ScanData; inline;
class operator implicit(var x: Int64): ScanData; inline;
class operator implicit(var x: NativeInt): ScanData; inline;

class operator implicit(var x: Single): ScanData; inline;
class operator implicit(var x: Double): ScanData; inline;
class operator implicit(var x: Extended): ScanData; inline;
class operator implicit(var x: TDateTime): ScanData; inline;
class operator implicit(var x: Currency): ScanData; inline;

class operator implicit(var x: Boolean): ScanData; inline;
class operator implicit(var x: Pointer): ScanData; inline;

class operator implicit(var x: Char): ScanData; inline;
class operator implicit(var x: AnsiChar): ScanData; inline;

class operator implicit(var x: String): ScanData; inline;
class operator implicit(var x: AnsiString): ScanData; inline;
class operator implicit(var x: WideString): ScanData; inline;
// compiler chokes on var x: ShortString inline declaration
class operator implicit(const x: ShortString): ScanData; inline;
private
function SetByteValue( const sValue: String; token: Char): Boolean;
function SetWordValue( const sValue: String; token: Char): Boolean;
function SetCardinalValue( const sValue: String; token: Char): Boolean;
function SetUInt64Value( const sValue: String; token: Char): Boolean;
function SetNativeUIntValue( const sValue: String; token: Char): Boolean; inline;
function SetShortIntValue( const sValue: String; token: Char): Boolean;
function SetSmallIntValue( const sValue: String; token: Char): Boolean;
function SetIntegerValue( const sValue: String; token: Char): Boolean;
function SetInt64Value( const sValue: String; token: Char): Boolean;
function SetNativeIntValue( const sValue: String; token: Char): Boolean; inline;
function SetBooleanValue( const sValue: String; token: Char): Boolean; inline;
function SetWCharValue( const sValue: String; token: Char): Boolean; inline;
function SetACharValue( const sValue: String; token: Char): Boolean; inline;
function SetUStringValue( const sValue: String; token: Char): Boolean; inline;
function SetWStringValue( const sValue: String; token: Char): Boolean; inline;
function SetAStringValue( const sValue: String; token: Char): Boolean; inline;
function SetShortStringValue(const sValue: String; token: Char): Boolean; inline;
function SetCurrencyValue( const sValue: String; token: Char;
const aFormatSetting: TFormatSettings): Boolean; inline;
function SetSingleValue( const sValue: String; token: Char;
const aFormatSetting: TFormatSettings): Boolean; inline;
function SetDoubleValue( const sValue: String; token: Char;
const aFormatSetting: TFormatSettings): Boolean; inline;
function SetExtendedValue( const sValue: String; token: Char;
const aFormatSetting: TFormatSettings): Boolean; inline;
function SetTDateTimeValue( const sValue: String; token: Char;
const aFormatSetting: TFormatSettings): Boolean; inline;

private type
TSetValue = function(const sValue: String; token: Char): Boolean of object;
TSetValueF = function(const sValue: String; token: Char;
const aFormatSetting: TFormatSettings): Boolean of object;
private
fVarRef: Pointer; // Variable reference (set by an implicit call)
case boolean of // Fake overloaded method references
false : (SetValue: TSetValue);
true : (SetValueF: TSetValueF);
end;

// Wrapped UnFormat functions
function UnFormat(
const Input, Format : String;
const Args : array of ScanData) : Integer; overload;
function UnFormat(
const Input, Format : String;
const Args : array of ScanData;
const aFormatSetting: TFormatSettings) : Integer; overload;



Comments

  1. Even less need for RegExp! That must be good!

    ReplyDelete
  2. Attila Kovacs, sorry for the typos, fixed, thanks. Supporting fixed length fields should not be too complex, I guess. Once the code is up, I can look into it.

    ReplyDelete

Post a Comment