utf8 filenames / QDir::entryList

**sedi** · 22nd August 2013, 18:06

Hi,
my program has to work on Windows and Android.
When I want to display files in a folder, I have encoding problems, as Android apparently uses utf8 for its file system. This is what I do:

Qt Code:

Switch view

ui->label->setText("Hallo Ã„ Ã– Ãœ Ã¤ Ã¶ Ã¼ ÃŸ .,;*+");
QStringList files=dir.entryList(QStringList()<<"*.*");
for (int i= 0;i<files.count();i++)
    {
        QTreeWidgetItem* item=new QTreeWidgetItem(QStringList()<<files.at(i));
        ui->treeWidget->addTopLevelItem(item);
    }

ui->label->setText("Hallo Ã„ Ã– Ãœ Ã¤ Ã¶ Ã¼ ÃŸ .,;*+");
QStringList files=dir.entryList(QStringList()<<"*.*");
for (int i= 0;i<files.count();i++)
    {
        QTreeWidgetItem* item=new QTreeWidgetItem(QStringList()<<files.at(i));
        ui->treeWidget->addTopLevelItem(item);
    }

To copy to clipboard, switch view to plain text mode

The label->setText stuff is shown correctly, so this is not a display thing. The treeWidget looks like this
android.jpg (Android)
instead of this
windows.jpg (Windows).
So the umlauts are borked.

I've tried setting the default locale to german and english, but this didn't help:

Qt Code:

Switch view

QLocale germanLocale(QLocale::German,QLocale::Germany) ;
    QLocale englishLocale(QLocale::English, QLocale::UnitedStates);
    QLocale::setDefault(germanLocale);
    // QLocale::setDefault(englishLocale);

QLocale germanLocale(QLocale::German,QLocale::Germany) ;
    QLocale englishLocale(QLocale::English, QLocale::UnitedStates);
    QLocale::setDefault(germanLocale);
    // QLocale::setDefault(englishLocale);

To copy to clipboard, switch view to plain text mode

Furthermore I've tried getting the entryInfoList, then decoding the fileName. But since QFileInfo::fileName() returns a String, I feel like chasing my own tail here:

Qt Code:

Switch view

QFileInfoList infos=dir.entryInfoList(QStringList()<<"*.*");
    for (int i=0; i<infos.count();i++)
    {
        QFileInfo inf=infos.at(i);
        QString fileName= QFile::decodeName(inf.fileName().toUtf8());
        QTreeWidgetItem* item=new QTreeWidgetItem(QStringList()<<fileName);
        ui->treeWidget->addTopLevelItem(item);
    }

QFileInfoList infos=dir.entryInfoList(QStringList()<<"*.*");
    for (int i=0; i<infos.count();i++)
    {
        QFileInfo inf=infos.at(i);
        QString fileName= QFile::decodeName(inf.fileName().toUtf8());
        QTreeWidgetItem* item=new QTreeWidgetItem(QStringList()<<fileName);
        ui->treeWidget->addTopLevelItem(item);
    }

To copy to clipboard, switch view to plain text mode

Needless to say that it also didn't work.

Docs say on http://qt-project.org/doc/qt-4.8/porting4.html#qdir that QDir::encodedEntryList() has been removed.
They also say on http://qt-project.org/doc/qt-5.1/qtc...codingFunction "does nothing". I'm still in 4.7, but I will probably change to 5.1 sooner or later.

I'm at my (small) wit's end.

**ChrisW67** · 22nd August 2013, 23:04

On your Android device read a broken file name from the list and inspect:

Qt Code:

Switch view

for (int i = 0; i < fileName.size(); ++i) 
  qDebug() << i << fileName.at(i).unicode();

for (int i = 0; i < fileName.size(); ++i) 
  qDebug() << i << fileName.at(i).unicode();

To copy to clipboard, switch view to plain text mode

Do you get this:

Qt Code:

Switch view

To copy to clipboard, switch view to plain text mode

for "BrÃ¼derHÃ¶rt" or something else?

**sedi** · 22nd August 2013, 23:23

Actually it is

Qt Code:

Switch view

66    B
114   r
195   &Atilde;
188   &frac14;
100   d	
101   e
114   r
72    H
195   &Atilde;
182   &para;
114  r 
116  t
46   .
106  j
112  p
103  g

66    B
114   r
195   &Atilde;
188   &frac14;
100   d	
101   e
114   r
72    H
195   &Atilde;
182   &para;
114  r 
116  t
46   .
106  j
112  p
103  g

To copy to clipboard, switch view to plain text mode

**sedi** · 23rd August 2013, 01:39

...but with that inspiration I did some kind of brute force workaround which seems to work for me, though it appears to be quite ugly.
I am very open for better ideas, especially concerning the performance...

Qt Code:

Switch view

QString MainWindow::fixUtf8BrokenString(QString text)
{
    int index = text.indexOf(QChar(195));
    while (index>=0)
    {
        if (text.count()>++index)
        {
            int code;
            code=text[index].toAscii();
            switch (code)
            {
            case  128: text.replace(QString(QChar(195))+QString(QChar(128)),"Ã€");break;
            case  129: text.replace(QString(QChar(195))+QString(QChar(129)),"Ã");break;
            case  130: text.replace(QString(QChar(195))+QString(QChar(130)),"Ã‚");break;
            case  131: text.replace(QString(QChar(195))+QString(QChar(131)),"Ãƒ");break;
            case  132: text.replace(QString(QChar(195))+QString(QChar(132)),"Ã„");break;
            case  133: text.replace(QString(QChar(195))+QString(QChar(133)),"Ã…");break;
            case  135: text.replace(QString(QChar(195))+QString(QChar(135)),"Ã‡");break;
            case  136: text.replace(QString(QChar(195))+QString(QChar(136)),"Ãˆ");break;
            case  137: text.replace(QString(QChar(195))+QString(QChar(137)),"Ã‰");break;
            case  138: text.replace(QString(QChar(195))+QString(QChar(138)),"ÃŠ");break;
            case  139: text.replace(QString(QChar(195))+QString(QChar(139)),"Ã‹");break;
            case  140: text.replace(QString(QChar(195))+QString(QChar(140)),"ÃŒ");break;
            case  141: text.replace(QString(QChar(195))+QString(QChar(141)),"Ã");break;
            case  142: text.replace(QString(QChar(195))+QString(QChar(142)),"ÃŽ");break;
            case  143: text.replace(QString(QChar(195))+QString(QChar(143)),"Ã");break;
            case  144: text.replace(QString(QChar(195))+QString(QChar(144)),"Ã");break;
            case  145: text.replace(QString(QChar(195))+QString(QChar(145)),"Ã‘");break;
            case  146: text.replace(QString(QChar(195))+QString(QChar(146)),"Ã’");break;
            case  147: text.replace(QString(QChar(195))+QString(QChar(147)),"Ã“");break;
            case  148: text.replace(QString(QChar(195))+QString(QChar(148)),"Ã”");break;
            case  149: text.replace(QString(QChar(195))+QString(QChar(149)),"Ã•");break;
            case  150: text.replace(QString(QChar(195))+QString(QChar(150)),"Ã–");break;
            case  152: text.replace(QString(QChar(195))+QString(QChar(152)),"Ã˜");break;
            case  153: text.replace(QString(QChar(195))+QString(QChar(153)),"Ã™");break;
            case  154: text.replace(QString(QChar(195))+QString(QChar(154)),"Ãš");break;
            case  155: text.replace(QString(QChar(195))+QString(QChar(155)),"Ã›");break;
            case  156: text.replace(QString(QChar(195))+QString(QChar(156)),"Ãœ");break;
            case  157: text.replace(QString(QChar(195))+QString(QChar(157)),"Ã");break;
            case  158: text.replace(QString(QChar(195))+QString(QChar(158)),"Ãž");break;
            case  159: text.replace(QString(QChar(195))+QString(QChar(159)),"ÃŸ");break;
            case  160: text.replace(QString(QChar(195))+QString(QChar(160)),"Ã ");break;
            case  161: text.replace(QString(QChar(195))+QString(QChar(161)),"Ã¡");break;
            case  162: text.replace(QString(QChar(195))+QString(QChar(162)),"Ã¢");break;
            case  163: text.replace(QString(QChar(195))+QString(QChar(163)),"Ã£");break;
            case  164: text.replace(QString(QChar(195))+QString(QChar(164)),"Ã¤");break;
            case  165: text.replace(QString(QChar(195))+QString(QChar(165)),"Ã¥");break;
            case  166: text.replace(QString(QChar(195))+QString(QChar(166)),"Ã¦");break;
            case  167: text.replace(QString(QChar(195))+QString(QChar(167)),"Ã§");break;
            case  168: text.replace(QString(QChar(195))+QString(QChar(168)),"Ã¨");break;
            case  169: text.replace(QString(QChar(195))+QString(QChar(169)),"Ã©");break;
            case  170: text.replace(QString(QChar(195))+QString(QChar(170)),"Ãª");break;
            case  171: text.replace(QString(QChar(195))+QString(QChar(171)),"Ã«");break;
            case  172: text.replace(QString(QChar(195))+QString(QChar(172)),"Ã¬");break;
            case  173: text.replace(QString(QChar(195))+QString(QChar(173)),"Ã*");break;
            case  174: text.replace(QString(QChar(195))+QString(QChar(174)),"Ã®");break;
            case  175: text.replace(QString(QChar(195))+QString(QChar(175)),"Ã¯");break;
            case  177: text.replace(QString(QChar(195))+QString(QChar(177)),"Ã±");break;
            case  178: text.replace(QString(QChar(195))+QString(QChar(178)),"Ã²");break;
            case  179: text.replace(QString(QChar(195))+QString(QChar(179)),"Ã³");break;
            case  180: text.replace(QString(QChar(195))+QString(QChar(180)),"Ã´");break;
            case  181: text.replace(QString(QChar(195))+QString(QChar(181)),"Ãµ");break;
            case  182: text.replace(QString(QChar(195))+QString(QChar(182)),"Ã¶");break;
            case  184: text.replace(QString(QChar(195))+QString(QChar(184)),"Ã¸");break;
            case  185: text.replace(QString(QChar(195))+QString(QChar(185)),"Ã¹");break;
            case  186: text.replace(QString(QChar(195))+QString(QChar(186)),"Ãº");break;
            case  187: text.replace(QString(QChar(195))+QString(QChar(187)),"Ã»");break;
            case  188: text.replace(QString(QChar(195))+QString(QChar(188)),"Ã¼");break;
            case  189: text.replace(QString(QChar(195))+QString(QChar(189)),"Ã½");break;
            case  191: text.replace(QString(QChar(195))+QString(QChar(191)),"Ã¿");break;
            }
        }
        index = text.indexOf(QChar(195));
    }
    return text;
}

QString MainWindow::fixUtf8BrokenString(QString text)
{
    int index = text.indexOf(QChar(195));
    while (index>=0)
    {
        if (text.count()>++index)
        {
            int code;
            code=text[index].toAscii();
            switch (code)
            {
            case  128: text.replace(QString(QChar(195))+QString(QChar(128)),"Ã€");break;
            case  129: text.replace(QString(QChar(195))+QString(QChar(129)),"Ã");break;
            case  130: text.replace(QString(QChar(195))+QString(QChar(130)),"Ã‚");break;
            case  131: text.replace(QString(QChar(195))+QString(QChar(131)),"Ãƒ");break;
            case  132: text.replace(QString(QChar(195))+QString(QChar(132)),"Ã„");break;
            case  133: text.replace(QString(QChar(195))+QString(QChar(133)),"Ã…");break;
            case  135: text.replace(QString(QChar(195))+QString(QChar(135)),"Ã‡");break;
            case  136: text.replace(QString(QChar(195))+QString(QChar(136)),"Ãˆ");break;
            case  137: text.replace(QString(QChar(195))+QString(QChar(137)),"Ã‰");break;
            case  138: text.replace(QString(QChar(195))+QString(QChar(138)),"ÃŠ");break;
            case  139: text.replace(QString(QChar(195))+QString(QChar(139)),"Ã‹");break;
            case  140: text.replace(QString(QChar(195))+QString(QChar(140)),"ÃŒ");break;
            case  141: text.replace(QString(QChar(195))+QString(QChar(141)),"Ã");break;
            case  142: text.replace(QString(QChar(195))+QString(QChar(142)),"ÃŽ");break;
            case  143: text.replace(QString(QChar(195))+QString(QChar(143)),"Ã");break;
            case  144: text.replace(QString(QChar(195))+QString(QChar(144)),"Ã");break;
            case  145: text.replace(QString(QChar(195))+QString(QChar(145)),"Ã‘");break;
            case  146: text.replace(QString(QChar(195))+QString(QChar(146)),"Ã’");break;
            case  147: text.replace(QString(QChar(195))+QString(QChar(147)),"Ã“");break;
            case  148: text.replace(QString(QChar(195))+QString(QChar(148)),"Ã”");break;
            case  149: text.replace(QString(QChar(195))+QString(QChar(149)),"Ã•");break;
            case  150: text.replace(QString(QChar(195))+QString(QChar(150)),"Ã–");break;
            case  152: text.replace(QString(QChar(195))+QString(QChar(152)),"Ã˜");break;
            case  153: text.replace(QString(QChar(195))+QString(QChar(153)),"Ã™");break;
            case  154: text.replace(QString(QChar(195))+QString(QChar(154)),"Ãš");break;
            case  155: text.replace(QString(QChar(195))+QString(QChar(155)),"Ã›");break;
            case  156: text.replace(QString(QChar(195))+QString(QChar(156)),"Ãœ");break;
            case  157: text.replace(QString(QChar(195))+QString(QChar(157)),"Ã");break;
            case  158: text.replace(QString(QChar(195))+QString(QChar(158)),"Ãž");break;
            case  159: text.replace(QString(QChar(195))+QString(QChar(159)),"ÃŸ");break;
            case  160: text.replace(QString(QChar(195))+QString(QChar(160)),"Ã ");break;
            case  161: text.replace(QString(QChar(195))+QString(QChar(161)),"Ã¡");break;
            case  162: text.replace(QString(QChar(195))+QString(QChar(162)),"Ã¢");break;
            case  163: text.replace(QString(QChar(195))+QString(QChar(163)),"Ã£");break;
            case  164: text.replace(QString(QChar(195))+QString(QChar(164)),"Ã¤");break;
            case  165: text.replace(QString(QChar(195))+QString(QChar(165)),"Ã¥");break;
            case  166: text.replace(QString(QChar(195))+QString(QChar(166)),"Ã¦");break;
            case  167: text.replace(QString(QChar(195))+QString(QChar(167)),"Ã§");break;
            case  168: text.replace(QString(QChar(195))+QString(QChar(168)),"Ã¨");break;
            case  169: text.replace(QString(QChar(195))+QString(QChar(169)),"Ã©");break;
            case  170: text.replace(QString(QChar(195))+QString(QChar(170)),"Ãª");break;
            case  171: text.replace(QString(QChar(195))+QString(QChar(171)),"Ã«");break;
            case  172: text.replace(QString(QChar(195))+QString(QChar(172)),"Ã¬");break;
            case  173: text.replace(QString(QChar(195))+QString(QChar(173)),"Ã*");break;
            case  174: text.replace(QString(QChar(195))+QString(QChar(174)),"Ã®");break;
            case  175: text.replace(QString(QChar(195))+QString(QChar(175)),"Ã¯");break;
            case  177: text.replace(QString(QChar(195))+QString(QChar(177)),"Ã±");break;
            case  178: text.replace(QString(QChar(195))+QString(QChar(178)),"Ã²");break;
            case  179: text.replace(QString(QChar(195))+QString(QChar(179)),"Ã³");break;
            case  180: text.replace(QString(QChar(195))+QString(QChar(180)),"Ã´");break;
            case  181: text.replace(QString(QChar(195))+QString(QChar(181)),"Ãµ");break;
            case  182: text.replace(QString(QChar(195))+QString(QChar(182)),"Ã¶");break;
            case  184: text.replace(QString(QChar(195))+QString(QChar(184)),"Ã¸");break;
            case  185: text.replace(QString(QChar(195))+QString(QChar(185)),"Ã¹");break;
            case  186: text.replace(QString(QChar(195))+QString(QChar(186)),"Ãº");break;
            case  187: text.replace(QString(QChar(195))+QString(QChar(187)),"Ã»");break;
            case  188: text.replace(QString(QChar(195))+QString(QChar(188)),"Ã¼");break;
            case  189: text.replace(QString(QChar(195))+QString(QChar(189)),"Ã½");break;
            case  191: text.replace(QString(QChar(195))+QString(QChar(191)),"Ã¿");break;
            }
        }
        index = text.indexOf(QChar(195));
    }
    return text;
}

To copy to clipboard, switch view to plain text mode

**ChrisW67** · 23rd August 2013, 04:44

So the file name coming off the device is being treated as a Latin1 string leading to a broken result.

Qt Code:

Switch view

195 = 0xC3
188 = 0xBC

195 = 0xC3
188 = 0xBC

To copy to clipboard, switch view to plain text mode

which are the correct bytes for a UTF8 encoded "Ã¼" (U+00FC) but are being converted to two QChars.
Quite how to fix this I don't know.

**sedi** · 23rd August 2013, 09:37

Are you sure with Latin1 "U+00FC" ? For me it seems like Ã+00BC, with Ã being sort of an escape character.

I've looked up the codes in this Utf8 table here. With that information I can just string-replace all Umlauts and other important chars. But for my case, it actually works as expected.

This said, it's probably quite slow to tackle the problem this way, it seems like reassembling the debris instead of preventing the accident.

Many thanks for the idea to actually look into the bytes myself - sometimes I don't see the wood for the trees. If anyone has a better idea (in terms of performance or safety of use), I'd be very happy to improve or entirely change my approach, but for the moment I can use that.

**ChrisW67** · 23rd August 2013, 10:21

U+00FC is a Unicode code point for the character 'Ã¼'. When encoded in UTF-8 that single Unicode character becomes 2 bytes 0xC3 0xBC.

If you interpret those two bytes as Latin1 characters (which are always one byte-one char) you get, as you point out, Ãƒ and Â¼.

So, the file name is encoded in UTF8 on the device. It is read as a set of bytes that are then incorrectly treated as a Latin1 string.

Thread: utf8 filenames / QDir::entryList

Thread Tools

Search Thread

Display

utf8 filenames / QDir::entryList

Re: utf8 filenames / QDir::entryList

The following user says thank you to ChrisW67 for this useful post:

Re: utf8 filenames / QDir::entryList

Re: utf8 filenames / QDir::entryList

Re: utf8 filenames / QDir::entryList

The following user says thank you to ChrisW67 for this useful post:

Re: utf8 filenames / QDir::entryList

Re: utf8 filenames / QDir::entryList

The following user says thank you to ChrisW67 for this useful post:

Similar Threads

QDir::entryList() get absolute path

QDir - Entrylist, sort

QDir entryList performing slowly

QDir::entryList() on linux

qdir and entrylist

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions