Results 1 to 7 of 7

Thread: utf8 filenames / QDir::entryList

  1. #1
    Join Date
    Jan 2012
    Location
    Dortmund, Germany
    Posts
    159
    Thanks
    69
    Thanked 10 Times in 8 Posts
    Qt products
    Qt4
    Platforms
    Windows Android

    Default utf8 filenames / QDir::entryList

    Hi,
    my program has to work on Windows and Android.
    When I want to display files in a folder, I have encoding problems, as Android apparently uses utf8 for its file system. This is what I do:

    Qt Code:
    1. ui->label->setText("Hallo Ä Ö Ü ä ö ü ß .,;*+");
    2. QStringList files=dir.entryList(QStringList()<<"*.*");
    3. for (int i= 0;i<files.count();i++)
    4. {
    5. QTreeWidgetItem* item=new QTreeWidgetItem(QStringList()<<files.at(i));
    6. ui->treeWidget->addTopLevelItem(item);
    7. }
    To copy to clipboard, switch view to plain text mode 
    The label->setText stuff is shown correctly, so this is not a display thing. The treeWidget looks like this
    android.jpg (Android)
    instead of this
    windows.jpg (Windows).
    So the umlauts are borked.

    I've tried setting the default locale to german and english, but this didn't help:
    Qt Code:
    1. QLocale germanLocale(QLocale::German,QLocale::Germany) ;
    2. QLocale englishLocale(QLocale::English, QLocale::UnitedStates);
    3. QLocale::setDefault(germanLocale);
    4. // QLocale::setDefault(englishLocale);
    To copy to clipboard, switch view to plain text mode 

    Furthermore I've tried getting the entryInfoList, then decoding the fileName. But since QFileInfo::fileName() returns a String, I feel like chasing my own tail here:
    Qt Code:
    1. QFileInfoList infos=dir.entryInfoList(QStringList()<<"*.*");
    2. for (int i=0; i<infos.count();i++)
    3. {
    4. QFileInfo inf=infos.at(i);
    5. QString fileName= QFile::decodeName(inf.fileName().toUtf8());
    6. ui->treeWidget->addTopLevelItem(item);
    7. }
    To copy to clipboard, switch view to plain text mode 
    Needless to say that it also didn't work.


    I'm at my (small) wit's end.

  2. #2
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: utf8 filenames / QDir::entryList

    On your Android device read a broken file name from the list and inspect:
    Qt Code:
    1. for (int i = 0; i < fileName.size(); ++i)
    2. qDebug() << i << fileName.at(i).unicode();
    To copy to clipboard, switch view to plain text mode 
    Do you get this:
    Qt Code:
    1. 0 66
    2. 1 114
    3. 2 252
    4. 3 100
    5. 4 101
    6. 5 114
    7. 6 72
    8. 7 246
    9. 8 114
    10. 9 116
    To copy to clipboard, switch view to plain text mode 
    for "BrüderHört" or something else?

  3. The following user says thank you to ChrisW67 for this useful post:

    sedi (23rd August 2013)

  4. #3
    Join Date
    Jan 2012
    Location
    Dortmund, Germany
    Posts
    159
    Thanks
    69
    Thanked 10 Times in 8 Posts
    Qt products
    Qt4
    Platforms
    Windows Android

    Default Re: utf8 filenames / QDir::entryList

    Actually it is

    Qt Code:
    1. 66 B
    2. 114 r
    3. 195 &Atilde;
    4. 188 &frac14;
    5. 100 d
    6. 101 e
    7. 114 r
    8. 72 H
    9. 195 &Atilde;
    10. 182 &para;
    11. 114 r
    12. 116 t
    13. 46 .
    14. 106 j
    15. 112 p
    16. 103 g
    To copy to clipboard, switch view to plain text mode 

  5. #4
    Join Date
    Jan 2012
    Location
    Dortmund, Germany
    Posts
    159
    Thanks
    69
    Thanked 10 Times in 8 Posts
    Qt products
    Qt4
    Platforms
    Windows Android

    Default Re: utf8 filenames / QDir::entryList

    ...but with that inspiration I did some kind of brute force workaround which seems to work for me, though it appears to be quite ugly.
    I am very open for better ideas, especially concerning the performance...
    Qt Code:
    1. QString MainWindow::fixUtf8BrokenString(QString text)
    2. {
    3. int index = text.indexOf(QChar(195));
    4. while (index>=0)
    5. {
    6. if (text.count()>++index)
    7. {
    8. int code;
    9. code=text[index].toAscii();
    10. switch (code)
    11. {
    12. case 128: text.replace(QString(QChar(195))+QString(QChar(128)),"À");break;
    13. case 129: text.replace(QString(QChar(195))+QString(QChar(129)),"Á");break;
    14. case 130: text.replace(QString(QChar(195))+QString(QChar(130)),"Â");break;
    15. case 131: text.replace(QString(QChar(195))+QString(QChar(131)),"Ã");break;
    16. case 132: text.replace(QString(QChar(195))+QString(QChar(132)),"Ä");break;
    17. case 133: text.replace(QString(QChar(195))+QString(QChar(133)),"Ã…");break;
    18. case 135: text.replace(QString(QChar(195))+QString(QChar(135)),"Ç");break;
    19. case 136: text.replace(QString(QChar(195))+QString(QChar(136)),"È");break;
    20. case 137: text.replace(QString(QChar(195))+QString(QChar(137)),"É");break;
    21. case 138: text.replace(QString(QChar(195))+QString(QChar(138)),"Ê");break;
    22. case 139: text.replace(QString(QChar(195))+QString(QChar(139)),"Ë");break;
    23. case 140: text.replace(QString(QChar(195))+QString(QChar(140)),"Ì");break;
    24. case 141: text.replace(QString(QChar(195))+QString(QChar(141)),"Í");break;
    25. case 142: text.replace(QString(QChar(195))+QString(QChar(142)),"ÃŽ");break;
    26. case 143: text.replace(QString(QChar(195))+QString(QChar(143)),"Ï");break;
    27. case 144: text.replace(QString(QChar(195))+QString(QChar(144)),"Ð");break;
    28. case 145: text.replace(QString(QChar(195))+QString(QChar(145)),"Ñ");break;
    29. case 146: text.replace(QString(QChar(195))+QString(QChar(146)),"Ã’");break;
    30. case 147: text.replace(QString(QChar(195))+QString(QChar(147)),"Ó");break;
    31. case 148: text.replace(QString(QChar(195))+QString(QChar(148)),"Ô");break;
    32. case 149: text.replace(QString(QChar(195))+QString(QChar(149)),"Õ");break;
    33. case 150: text.replace(QString(QChar(195))+QString(QChar(150)),"Ö");break;
    34. case 152: text.replace(QString(QChar(195))+QString(QChar(152)),"Ø");break;
    35. case 153: text.replace(QString(QChar(195))+QString(QChar(153)),"Ù");break;
    36. case 154: text.replace(QString(QChar(195))+QString(QChar(154)),"Ú");break;
    37. case 155: text.replace(QString(QChar(195))+QString(QChar(155)),"Û");break;
    38. case 156: text.replace(QString(QChar(195))+QString(QChar(156)),"Ü");break;
    39. case 157: text.replace(QString(QChar(195))+QString(QChar(157)),"Ý");break;
    40. case 158: text.replace(QString(QChar(195))+QString(QChar(158)),"Þ");break;
    41. case 159: text.replace(QString(QChar(195))+QString(QChar(159)),"ß");break;
    42. case 160: text.replace(QString(QChar(195))+QString(QChar(160)),"Ã ");break;
    43. case 161: text.replace(QString(QChar(195))+QString(QChar(161)),"á");break;
    44. case 162: text.replace(QString(QChar(195))+QString(QChar(162)),"â");break;
    45. case 163: text.replace(QString(QChar(195))+QString(QChar(163)),"ã");break;
    46. case 164: text.replace(QString(QChar(195))+QString(QChar(164)),"ä");break;
    47. case 165: text.replace(QString(QChar(195))+QString(QChar(165)),"Ã¥");break;
    48. case 166: text.replace(QString(QChar(195))+QString(QChar(166)),"æ");break;
    49. case 167: text.replace(QString(QChar(195))+QString(QChar(167)),"ç");break;
    50. case 168: text.replace(QString(QChar(195))+QString(QChar(168)),"è");break;
    51. case 169: text.replace(QString(QChar(195))+QString(QChar(169)),"é");break;
    52. case 170: text.replace(QString(QChar(195))+QString(QChar(170)),"ê");break;
    53. case 171: text.replace(QString(QChar(195))+QString(QChar(171)),"ë");break;
    54. case 172: text.replace(QString(QChar(195))+QString(QChar(172)),"ì");break;
    55. case 173: text.replace(QString(QChar(195))+QString(QChar(173)),"Ã*");break;
    56. case 174: text.replace(QString(QChar(195))+QString(QChar(174)),"î");break;
    57. case 175: text.replace(QString(QChar(195))+QString(QChar(175)),"ï");break;
    58. case 177: text.replace(QString(QChar(195))+QString(QChar(177)),"ñ");break;
    59. case 178: text.replace(QString(QChar(195))+QString(QChar(178)),"ò");break;
    60. case 179: text.replace(QString(QChar(195))+QString(QChar(179)),"ó");break;
    61. case 180: text.replace(QString(QChar(195))+QString(QChar(180)),"ô");break;
    62. case 181: text.replace(QString(QChar(195))+QString(QChar(181)),"õ");break;
    63. case 182: text.replace(QString(QChar(195))+QString(QChar(182)),"ö");break;
    64. case 184: text.replace(QString(QChar(195))+QString(QChar(184)),"ø");break;
    65. case 185: text.replace(QString(QChar(195))+QString(QChar(185)),"ù");break;
    66. case 186: text.replace(QString(QChar(195))+QString(QChar(186)),"ú");break;
    67. case 187: text.replace(QString(QChar(195))+QString(QChar(187)),"û");break;
    68. case 188: text.replace(QString(QChar(195))+QString(QChar(188)),"ü");break;
    69. case 189: text.replace(QString(QChar(195))+QString(QChar(189)),"ý");break;
    70. case 191: text.replace(QString(QChar(195))+QString(QChar(191)),"ÿ");break;
    71. }
    72. }
    73. index = text.indexOf(QChar(195));
    74. }
    75. return text;
    76. }
    To copy to clipboard, switch view to plain text mode 

  6. #5
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: utf8 filenames / QDir::entryList

    So the file name coming off the device is being treated as a Latin1 string leading to a broken result.
    Qt Code:
    1. 195 = 0xC3
    2. 188 = 0xBC
    To copy to clipboard, switch view to plain text mode 
    which are the correct bytes for a UTF8 encoded "ü" (U+00FC) but are being converted to two QChars.
    Quite how to fix this I don't know.

  7. The following user says thank you to ChrisW67 for this useful post:

    sedi (23rd August 2013)

  8. #6
    Join Date
    Jan 2012
    Location
    Dortmund, Germany
    Posts
    159
    Thanks
    69
    Thanked 10 Times in 8 Posts
    Qt products
    Qt4
    Platforms
    Windows Android

    Default Re: utf8 filenames / QDir::entryList

    Are you sure with Latin1 "U+00FC" ? For me it seems like &Atilde;+00BC, with &Atilde; being sort of an escape character.

    I've looked up the codes in this Utf8 table here. With that information I can just string-replace all Umlauts and other important chars. But for my case, it actually works as expected.

    This said, it's probably quite slow to tackle the problem this way, it seems like reassembling the debris instead of preventing the accident.


    Many thanks for the idea to actually look into the bytes myself - sometimes I don't see the wood for the trees. If anyone has a better idea (in terms of performance or safety of use), I'd be very happy to improve or entirely change my approach, but for the moment I can use that.

  9. #7
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: utf8 filenames / QDir::entryList

    U+00FC is a Unicode code point for the character 'ü'. When encoded in UTF-8 that single Unicode character becomes 2 bytes 0xC3 0xBC.

    If you interpret those two bytes as Latin1 characters (which are always one byte-one char) you get, as you point out, Ã and ¼.

    So, the file name is encoded in UTF8 on the device. It is read as a set of bytes that are then incorrectly treated as a Latin1 string.

  10. The following user says thank you to ChrisW67 for this useful post:

    sedi (25th August 2013)

Similar Threads

  1. QDir::entryList() get absolute path
    By Aji Enrico in forum Qt Programming
    Replies: 3
    Last Post: 23rd April 2011, 05:26
  2. QDir - Entrylist, sort
    By bismitapadhy in forum Qt Programming
    Replies: 5
    Last Post: 28th January 2010, 07:27
  3. QDir entryList performing slowly
    By bunjee in forum Qt Programming
    Replies: 3
    Last Post: 8th October 2009, 17:21
  4. QDir::entryList() on linux
    By JeanC in forum Qt Programming
    Replies: 3
    Last Post: 3rd March 2008, 14:46
  5. qdir and entrylist
    By mattia in forum Newbie
    Replies: 1
    Last Post: 28th November 2007, 11:13

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.