How to read a XML file that uses UTF-8?
Hi!
There's a program I want to modify that has some problems parsing an XML file that uses UTF-8.
The content of some of the fields in the XML file is dumped into a flat file using a QTextStream (to which I tried specifying the encoding) but I can see that the characters which are not present in normal 7-bit ASCII are not correctly processed.
For example, a UTF-8 character that takes two bytes ends up in the flat file taking 4-5 bytes.
My guess is that when the file is read Qt (the code in question uses a QDomDocument) thinks that the file is in ISO-8859-1 (or something like that) and read the UTF-8 character as two characters. When it then tries to dump it in the flat file it tries to store these two characters as separate UTF-8 multi-bytes characters.
The end result is that the text strings end up being corrupted.
Is there a way to tell a QDomDocument which character set to use? Is it supposed to do it by itself using the XML header or is there something else to do? The correct character set (UTF-8) is declared in the XML file header.
Thank you!
Nick
Re: How to read a XML file that uses UTF-8?
How looks your code? Did you use [code=xml]<?xml version="1.0" encoding="utf-8"?>[/code] inside your xml files?
Re: How to read a XML file that uses UTF-8?
Hi!
Sorry for the delayed reply, this week has been completely crazy...
The program in question is not a program I wrote, it's a program I'm trying to modify.
I'll have to simplify it somewhat in order to post it here.
The XML file does have the encoding declaration you posted, that's what I meant by <<The correct character set (UTF-8) is declared in the XML file header.>>.
I was hoping that the problem could be some sort of encoding declaration (like the one that can be done with a QTextStream) but there doesn't seem to be any to specify it when using QDomDocument (setContent) and a QFile.
Thank you!
Nick