Results 1 to 20 of 20

Thread: QByteArray and UTF-8

  1. #1
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default QByteArray and UTF-8

    Hi guys,

    I try get data from binary file and everything is fine until a utf-8 character.
    For example i have this code:
    Qt Code:
    1. QFile *sub = new QFile("C:\\TestFile.ctl");
    2. if (!sub->open(QIODevice::ReadOnly)){
    3. return 0;
    4. }
    5. QByteArray data = sub->read(200);
    6. int start = data.indexOf(QByteArray::fromHex("0001"), data.indexOf(QByteArray::fromHex("0000"), 32))+8;
    7. int stop = data.indexOf(QByteArray::fromHex("0000"), start);
    8. for (int i = from; i < to; ++i)
    9. {
    10. test << reinterpret_cast<const char*>(&data.constData()[i]);
    11. }
    12. name = test.join(" ").replace(" ", "");
    13. qDebug() << name;
    To copy to clipboard, switch view to plain text mode 

    In qDebug() i get this

    but originals strings in the binary file is Čeština.

    How can i get this original string.

    Thank you for help me.

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    Qt Code:
    1. qDebug() << QString::fromUtf8(data.constData(), data.size());
    To copy to clipboard, switch view to plain text mode 

    By the way, what's the point of QByteArray::fromHex("0001") and this whole search you are doing?
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  3. #3
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    TestFile.ctl is not my file and contains much data. Index in the lines 6-7 is the point where is stored the name. And the name is in others files different. This help me to find Start and Stop where is the name.
    I write the data to QStringList named test so this
    Qt Code:
    1. qDebug() << QString::fromUtf8(data.constData(), data.size());
    To copy to clipboard, switch view to plain text mode 
    is not solution for me

    BTW: I try name.toUtf8() and others codings but doesn't view original name.
    Last edited by Benecore; 30th May 2012 at 12:59.

  4. #4
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    Why is it not a solution for you?

    My question about fromHex is that I don't understand why you're calling fromHex at all. This whole search of yours looks weird. If you tell us what exactly you are trying to do, maybe we'll find a more straightforward solution.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  5. #5
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    In the file is stored information about Name and Vendor name of application (Symbian OS). And i want get this informations. Following code
    Qt Code:
    1. void GetData()
    2. {
    3. QFile *sub = new QFile(QString("c:\\TestFile.ctl"));
    4. if (sub->exists()){
    5. if (!sub->open(QIODevice::ReadOnly)){
    6. return;
    7. }
    8. QByteArray data = sub->read(200);
    9. int findvendor = data.indexOf(QByteArray::fromHex("0000"), 32);
    10. QStringList vendorList;
    11. for (int i = 32; i<findvendor; i++){
    12. vendorList << reinterpret_cast<const char*>(&data.constData()[i]);
    13. }
    14. vendor = vendorList.join(" ").replace(" ", "");
    15. int beginName = data.indexOf(QByteArray::fromHex("0001"), findvendor)+8;
    16. int endName = data.indexOf(QByteArray::fromHex("0000"), beginName);
    17. QStringList nameList;
    18. for (int i = beginName; i<endName; i++){
    19. nameList << reinterpret_cast<const char*>(&data.constData()[i]);
    20. }
    21. qDebug() << nameList;
    22. name = nameList.join(" ").replace(" ", "").replace(QChar(0x00), "").replace(QChar(0x0c), "").replace(QChar(0x02), "");
    23. }else{
    24. name = tr("Unknown");
    25. vendor = tr("Unknown");
    26. }
    27. qDebug() << name << '\n' << vendor;
    28. delete sub;
    29. }
    To copy to clipboard, switch view to plain text mode 
    works fine until file doesn't contain UTF-8 chars.
    Here is the two files. One is without UTF-8 and one with UTF-8 chars.
    TestFiles.zip

  6. #6
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    I was expecting more something like I need to extract a text string that starts not earlier than 32 bytes from the beginning of the file, is prefixed by 0x0001 and suffixed by 0x0000. Presenting more weird code is definitely not going to help.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  7. #7
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    Sorry my english is bad. Yes i want get string which start in 32 bytes

  8. #8
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    In 32 bytes? Why then are you searching for 0x0000?
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  9. #9
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    I programming in C++ just couple months. This code is rewritten from Python application.
    Author gave me permission to use the code. My experience with binary files in C++ language are small.
    This is python code
    Qt Code:
    1. finvend = s.cont.find(hexto('0000'), 32)
    2. vendor = s.cont[32:finvend]
    3. start = (s.cont.find(hexto('0001'), finvend) + 8)
    4. name = s.cont[start:s.cont.find(hexto('0000'), start)]
    To copy to clipboard, switch view to plain text mode 
    I try rewrite this code and i know is not good solutions but for a while i don't understand how to read Binary files in C++

    BTW: I tried for example read just 4 or 8 byte start with 32 bytes. file.seek(32) and then read, but result is not same as this previous code. I missing in C++ indexing like index[from:to]

  10. #10
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    It's not a matter of knowing C++ or not. It's a matter of knowing what you are looking for. Do you know what you are looking for?

    Qt Code:
    1. int endOfVendor = ba.indexOf("\x00\x00", 32);
    2. QByteArray vendorDat = ba.mid(32, endOfVendor-32);
    3. QString vendorStr = QString::fromUtf8(vendorDat.constData(), vendorDat.size());
    To copy to clipboard, switch view to plain text mode 
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  11. #11
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    Thanks for another option, but result is same (little better).
    cest.png

    BTW: This
    Qt Code:
    1. ba.indexOf("\x00\x00", 32);
    To copy to clipboard, switch view to plain text mode 
    and this
    Qt Code:
    1. ba.indexOf(QByteArray::fromHex("0000"), 32);
    To copy to clipboard, switch view to plain text mode 
    is not same because return value (start byte position) is different. I tried two QChar (0x00), but this doesn't matter my problem is only the result of string with UTF-8 chars.

  12. #12
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    The result is wrong probably because printing to the console directly goes through latin1 decoding instead of utf-8. Try printing the result to a file and then open the file in some editor capable of using utf-8.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  13. #13
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    Hm, something is wrong but what? jesus....
    This is the content of text file:
    text.png

    I write the string like that:
    Qt Code:
    1. QFile zapis("C:\\test.txt");
    2. zapis.open(QIODevice::WriteOnly | QIODevice::Text);
    3. QTextStream streamFileOut(&zapis);
    4. //streamFileOut.setCodec("UTF-8");
    5. streamFileOut << vendorStr;
    6. streamFileOut.flush();
    7. zapis.close();
    To copy to clipboard, switch view to plain text mode 

  14. #14
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    Probably the editor does not know the string is in utf-8 (if it is in utf-8).
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  15. #15
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    String is UTF-8. I tried write this to file
    Qt Code:
    1. QFile zapis("C:\\test.txt");
    2. zapis.open(QIODevice::WriteOnly | QIODevice::Text);
    3. QTextStream streamFileOut(&zapis);
    4. streamFileOut << QString::fromUtf8("Čeština");
    5. streamFileOut.flush();
    6. zapis.close();
    To copy to clipboard, switch view to plain text mode 
    and the result is not same.

    Is in this code something wrong?
    Qt Code:
    1. QFile file("C:/TestFile.ctl");
    2. if (!file.open(QIODevice::ReadOnly)){
    3. return 0;
    4. }
    5. QByteArray data = file.read(200)
    6. QByteArray nameData = data.mid(59, 14);
    7. QString nameStr = QString::fromUtf8(nameData.constData(), nameData.size());
    8. file.close();
    9. QFile zapis("C:\\test.txt");
    10. zapis.open(QIODevice::WriteOnly | QIODevice::Text);
    11. QTextStream streamFileOut(&zapis);
    12. //streamFileOut.setCodec("UTF-8");
    13. streamFileOut << nameStr;
    14. streamFileOut.flush();
    15. zapis.close();
    To copy to clipboard, switch view to plain text mode 

    I don't understand why the string does not correct result

  16. #16
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    I will repeat my earlier question -- do you know what you are looking for or are you just guessing what you are doing? Are you sure the string is stored in the file the way you are trying to read it? Reading 14 characters from a utf-8 string will definitely not give you "Čeština". If this string was utf-8 encoded, it would be stored using 9 bytes (5 bytes for ascii characters + 4 bytes for two non-ascii characters).

    Qt Code:
    1. #include <QtCore>
    2.  
    3. int main(int argc, char **argv) {
    4. QString str = QString::fromUtf8("Čeština");
    5. QByteArray ba = str.toUtf8();
    6. qDebug() << "size:" << ba.size();
    7. qDebug() << "hex:" << ba.toHex();
    8. }
    To copy to clipboard, switch view to plain text mode 

    Output:

    size: 9
    hex: "c48c65c5a174696e61"
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  17. #17
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    Yes i know what I'm looking for. I looking for solution my problem and my problem is correct result of UTF-8 string, that's all. I know that Čeština has 9 bytes. But inside the binary file is between each chars one empty char (00 - is char or isn't) so Čeština hasn't 9 bytes but 14 bytes.
    Image of the binary file in HEX editor
    The chars Č and š are probably somewhere else.
    hexokno.png
    I have too much files with this format and this is the results if i use loop
    resultOKNO.png
    As you can see all names are retrieved from binary file with use algorythm and displayed correctly. Only names which contains UTF-8 chars is not correctly displayed. But it doesn't matter if not has a solution. Sorry my english
    Last edited by Benecore; 31st May 2012 at 14:01.

  18. #18
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    Quote Originally Posted by Benecore View Post
    But inside the binary file is between each chars one empty char (00 - is char or isn't) so Čeština hasn't 9 bytes but 14 bytes.
    So it is not UTF-8 encoded. End of story. Even if each character was separated by a null character, then UTF-8 encoded Čeština would be 18 bytes and not 14. To me it seems you simply have a Unicode string there with each character encoded using 16 bits.

    Qt Code:
    1. #include <QtCore>
    2.  
    3. int main() {
    4. QString str = QString::fromUtf8("Čeština");
    5. QByteArray utf8 = str.toUtf8();
    6. qDebug() << "UTF-8 size:" << utf8.size() << utf8.toHex();
    7. QVector<unsigned int> ucs4array = str.toUcs4();
    8. QByteArray ucs4((const char*)ucs4array.data(), ucs4array.size() * sizeof(unsigned int)); // I'm lazy, not converting to big-endian
    9. qDebug() << "UCS-4 size:" << ucs4.size() << ucs4.toHex();
    10. const QChar *unicodeStr = str.unicode();
    11. QByteArray unicode;
    12. const QChar *c = unicodeStr;
    13. while(c->unicode()) {
    14. ushort val = qToBigEndian(c->unicode());
    15. QByteArray ba((const char*)&val, sizeof(ushort));
    16. unicode.append(ba);
    17. c++; // always wanted to do that :)
    18. }
    19. qDebug() << "Unicode size:" << unicode.size() << unicode.toHex();
    20. return 0;
    21. }
    To copy to clipboard, switch view to plain text mode 

    Output:
    UTF-8 size: 9 "c48c65c5a174696e61"
    UCS-4 size: 28 "0c010000650000006101000074000000690000006e0000006 1000000"
    Unicode size: 14 "010c0065016100740069006e0061"

    Remark: The above output shows a series of values and not an encoded string (hence the difference between '010c' and '0c01').
    Last edited by wysota; 31st May 2012 at 14:58.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  19. #19
    Join Date
    Apr 2012
    Location
    Slovakia
    Posts
    12
    Qt products
    Qt4 Qt5
    Platforms
    Windows Symbian S60 Maemo/MeeGo

    Default Re: QByteArray and UTF-8

    Thanks for this answer, this I did not know.

    So my last question is: Is possible to get original strings with UTF-8 support from this files? or not.

  20. #20
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QByteArray and UTF-8

    Yes. However what you have is not UTF-8, it's pure 16-bit Unicode. See QString::setUnicode()
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


Similar Threads

  1. Replies: 1
    Last Post: 22nd June 2011, 08:12
  2. Regarding qbytearray
    By mohanakrishnan in forum Qt Programming
    Replies: 7
    Last Post: 19th November 2009, 13:38
  3. Replies: 9
    Last Post: 25th July 2009, 13:27
  4. QByteArray
    By gyre in forum Newbie
    Replies: 4
    Last Post: 9th October 2007, 18:30
  5. QByteArray in Qt3
    By joseph in forum Qt Programming
    Replies: 1
    Last Post: 6th September 2007, 06:16

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Qt is a trademark of The Qt Company.