Results 1 to 3 of 3

Thread: How to read a XML file that uses UTF-8?

  1. #1
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    6
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default How to read a XML file that uses UTF-8?

    Hi!

    There's a program I want to modify that has some problems parsing an XML file that uses UTF-8.

    The content of some of the fields in the XML file is dumped into a flat file using a QTextStream (to which I tried specifying the encoding) but I can see that the characters which are not present in normal 7-bit ASCII are not correctly processed.

    For example, a UTF-8 character that takes two bytes ends up in the flat file taking 4-5 bytes.

    My guess is that when the file is read Qt (the code in question uses a QDomDocument) thinks that the file is in ISO-8859-1 (or something like that) and read the UTF-8 character as two characters. When it then tries to dump it in the flat file it tries to store these two characters as separate UTF-8 multi-bytes characters.

    The end result is that the text strings end up being corrupted.

    Is there a way to tell a QDomDocument which character set to use? Is it supposed to do it by itself using the XML header or is there something else to do? The correct character set (UTF-8) is declared in the XML file header.

    Thank you!

    Nick
    Last edited by PaladinKnight; 5th April 2010 at 18:28.

  2. #2
    Join Date
    Jan 2006
    Location
    Germany
    Posts
    4,380
    Thanks
    19
    Thanked 1,005 Times in 913 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows Symbian S60
    Wiki edits
    5

    Default Re: How to read a XML file that uses UTF-8?

    How looks your code? Did you use [code=xml]<?xml version="1.0" encoding="utf-8"?>[/code] inside your xml files?

  3. The following user says thank you to Lykurg for this useful post:

    PaladinKnight (10th April 2010)

  4. #3
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    6
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: How to read a XML file that uses UTF-8?

    Hi!

    Sorry for the delayed reply, this week has been completely crazy...

    The program in question is not a program I wrote, it's a program I'm trying to modify.

    I'll have to simplify it somewhat in order to post it here.

    The XML file does have the encoding declaration you posted, that's what I meant by <<The correct character set (UTF-8) is declared in the XML file header.>>.

    I was hoping that the problem could be some sort of encoding declaration (like the one that can be done with a QTextStream) but there doesn't seem to be any to specify it when using QDomDocument (setContent) and a QFile.

    Thank you!

    Nick

Similar Threads

  1. Read binary file
    By jaca in forum Qt Programming
    Replies: 9
    Last Post: 28th March 2012, 08:38
  2. read the file which has the format of odp!
    By sunnysun520 in forum Qt Programming
    Replies: 11
    Last Post: 14th May 2009, 15:14
  3. is qt phonon can read realmedia file and divx file
    By fayssalqt in forum Qt Programming
    Replies: 1
    Last Post: 27th January 2009, 15:42
  4. Replies: 1
    Last Post: 20th June 2008, 18:43
  5. Read An Xml File
    By Alienxs in forum Qt Programming
    Replies: 3
    Last Post: 5th January 2007, 00:28

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Qt is a trademark of The Qt Company.