2 Attachment(s)
utf8 filenames / QDir::entryList
Hi,
my program has to work on Windows and Android.
When I want to display files in a folder, I have encoding problems, as Android apparently uses utf8 for its file system. This is what I do:
Code:
ui->label->setText("Hallo Ä Ö Ü ä ö ü ß .,;*+");
for (int i= 0;i<files.count();i++)
{
ui->treeWidget->addTopLevelItem(item);
}
The label->setText stuff is shown correctly, so this is not a display thing. The treeWidget looks like this
Attachment 9456 (Android)
instead of this
Attachment 9457 (Windows).
So the umlauts are borked.
I've tried setting the default locale to german and english, but this didn't help:
Code:
// QLocale::setDefault(englishLocale);
Furthermore I've tried getting the entryInfoList, then decoding the fileName. But since QFileInfo::fileName() returns a String, I feel like chasing my own tail here:
Code:
QFileInfoList infos
=dir.
entryInfoList(QStringList()<<
"*.*");
for (int i=0; i<infos.count();i++)
{
QString fileName
= QFile::decodeName(inf.
fileName().
toUtf8());
ui->treeWidget->addTopLevelItem(item);
}
Needless to say that it also didn't work.
I'm at my (small) wit's end.
Re: utf8 filenames / QDir::entryList
On your Android device read a broken file name from the list and inspect:
Code:
for (int i = 0; i < fileName.size(); ++i)
qDebug() << i << fileName.at(i).unicode();
Do you get this:
Code:
0 66
1 114
2 252
3 100
4 101
5 114
6 72
7 246
8 114
9 116
for "BrüderHört" or something else?
Re: utf8 filenames / QDir::entryList
Actually it is
Code:
66 B
114 r
195 Ã
188 ¼
100 d
101 e
114 r
72 H
195 Ã
182 ¶
114 r
116 t
46 .
106 j
112 p
103 g
Re: utf8 filenames / QDir::entryList
...but with that inspiration I did some kind of brute force workaround which seems to work for me, though it appears to be quite ugly.
I am very open for better ideas, especially concerning the performance...
Code:
{
int index
= text.
indexOf(QChar(195));
while (index>=0)
{
if (text.count()>++index)
{
int code;
code=text[index].toAscii();
switch (code)
{
}
}
index
= text.
indexOf(QChar(195));
}
return text;
}
Re: utf8 filenames / QDir::entryList
So the file name coming off the device is being treated as a Latin1 string leading to a broken result.
Code:
195 = 0xC3
188 = 0xBC
which are the correct bytes for a UTF8 encoded "ü" (U+00FC) but are being converted to two QChars.
Quite how to fix this I don't know.
Re: utf8 filenames / QDir::entryList
Are you sure with Latin1 "U+00FC" ? For me it seems like Ã+00BC, with à being sort of an escape character.
I've looked up the codes in this Utf8 table here. With that information I can just string-replace all Umlauts and other important chars. But for my case, it actually works as expected.
This said, it's probably quite slow to tackle the problem this way, it seems like reassembling the debris instead of preventing the accident.
Many thanks for the idea to actually look into the bytes myself - sometimes I don't see the wood for the trees. If anyone has a better idea (in terms of performance or safety of use), I'd be very happy to improve or entirely change my approach, but for the moment I can use that.
Re: utf8 filenames / QDir::entryList
U+00FC is a Unicode code point for the character 'ü'. When encoded in UTF-8 that single Unicode character becomes 2 bytes 0xC3 0xBC.
If you interpret those two bytes as Latin1 characters (which are always one byte-one char) you get, as you point out, Ã and ¼.
So, the file name is encoded in UTF8 on the device. It is read as a set of bytes that are then incorrectly treated as a Latin1 string.