
26 Jan
2012
26 Jan
'12
7:16 p.m.
On 01/26/2012 07:29 PM, Lee Passey wrote:
It is true that XML files can be encoded using UTF-16, in which case the first line will /not/ be ASCII, and a BOM should be required (what's the default byte ordering of UTF-16 if there's no BOM, I wonder?)
You can detect a lot by reading the first 4 bytes and knowing they must represent a prefix of '<?xm'.
But if the file is not UTF-16 then at least the first line is guaranteed to be ASCII
Wrong. It can also be UCS-4, UCS-2, or EBCDIC. -- Marcello Perathoner webmaster@gutenberg.org