Note to self:

Note to self:

When sending our xml schema to a third party, make sure to mention they do actually have to follow the xml standard and include the encoding attribute in the xml header if they intend to send files in anything but utf-8, like say windows-1252.

Just got an email where the developer on the other side complained that this was NOT specified in our documentation...

Comments

  1. Haha reminds me when exchanging XML with a 3rd party and I got a complaint that our XML files exceeded their maximum line length (:
    That was actually their second complaint. The first was that they could not use UTF-16 in Windows.

    ReplyDelete
  2. Jeroen Wiert Pluimers Maximum line length for xml, oh boy that is just brilliant!

    ReplyDelete
  3. Jeroen Wiert Pluimers I vividly remember Unicode-ifying an application and having an option to export its text-format files as UTF-16, which in my new Unicode enthusiasm and happily thinking "Windows uses this internally, it's obviously best" I made the default. Only to discover, thankfully before release, that some apps - eg Excel! - can't handle UTF-16 files.

    Luckily UTF-8 is read by everyone.

    ReplyDelete
  4. My country is oh so modern in that we are required to use electronic invoices. They have XMLDSIG and whatsnot. However, they reject anything not encoded in ISO-8859-1; this means (of course!) that "iso-8859-1" is also rejected.

    ReplyDelete
  5. I once did specify such things for an integration, but the other party disregarded it.

    ReplyDelete
  6. While the XML spec requires utf-8 & utf-16 support in XML processors, it does not require anything specific wrt to the files themselves, check the spec... and weep :-)

    http://www.w3.org/TR/REC-xml/#charencoding
    http://www.w3.org/TR/REC-xml/#sec-guessing

    So in those cases your best bet is to just outright specify the text file format (be they XML or whatever), and for utf, be sure to specify whether the BOM should be present or not as well (as that's another can'o worms)

    David Millington some XML processors have trouble with utf-8 BOM, a few years ago we had to "pre-process" XML files that were exchanged between a Java XML engine and an MSXML engine, one required the utf-8 BOM, the other did not support it...

    ReplyDelete
  7. Eric Grange I think the spec is pretty clear:

    "In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8."

    ReplyDelete
  8. Asbjørn Heid not clear at all actually: if they had wanted to make it clear, they could have said so in a single sentence, rather than at the end of a 4 lines sentence buried in a 9 paragraph spec entry.

    That's a general problem with the whole XML spec btw: it proceeds more by corner cases and exclusions than by normative definition.

    ReplyDelete
  9. Eric Grange Fair enough, I agree that such an important point should be mentioned more explicitly.

    ReplyDelete

Post a Comment