I will try to explain what is happening. Start with this line:
QString test
= QString::fromUtf8("\u041F\u0440\u0438\u0432\u0435\u0442 \u041C\u0438\u0440");
QString test = QString::fromUtf8("\u041F\u0440\u0438\u0432\u0435\u0442 \u041C\u0438\u0440");
To copy to clipboard, switch view to plain text mode
This is what happens on Linux with GCC.
The compiler sees the \u041f and inserts the UTF8 encoded version of the U+041f character (П) into the string. That is two bytes 0xD0 and 0x9F. It does this for the whole string. The result is a C-style string of bytes (in hex) that is the UTF8 encoded string:
D0 9F D1 80 D0 B8 D0 B2 D0 B5 D1 82 20 D0 9C D0 B8 D1 80
D0 9F D1 80 D0 B8 D0 B2 D0 B5 D1 82 20 D0 9C D0 B8 D1 80
To copy to clipboard, switch view to plain text mode
We feed that into fromUtf8() and we get a valid QString with the correct characters. When the Linux program executes, qDebug() outputs the QString correctly encoded for my UTF8 terminal and I get the expected characters on screen. The same goes for QLabel.
On Windows with MS VC++ (2010):
The compiler sees the \u041f and and tries to map the U+041f character (П) to the system's 8-bit Windows code page before putting it in the string. Unless your system code page is Windows-1251 there is not likely to be an equivalent of П and the compiler inserts ? as a placeholder for the character it could not convert. The compiler issues a warning:
warning C4566: character represented by universal-character-name '\u041F' cannot be represented in the current code page (1252)
warning C4566: character represented by universal-character-name '\u041F' cannot be represented in the current code page (1252)
To copy to clipboard, switch view to plain text mode
It does this for the whole string. The result is a C-style string of bytes (in hex) that is not at all what you were expecting:
3F 3F 3F 3F 3F 3F 20 3F 3F 3F
3F 3F 3F 3F 3F 3F 20 3F 3F 3F
To copy to clipboard, switch view to plain text mode
We feed that into fromUtf8() and we get a valid QString but not the correct characters. qDebug() and QLabel cannot give the expected output now. This compiler behaviour seems to be the same regardless of what encoding the input file is or whether it has a UTF8 byte-order-mark or not.
If I change the line to:
QString test
= QString::fromUtf8("\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\x20\xD0\x9C\xD0\xB8\xD1\x80");
QString test = QString::fromUtf8("\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\x20\xD0\x9C\xD0\xB8\xD1\x80");
To copy to clipboard, switch view to plain text mode
I have done the UTF8 encoding and avoid the compiler's attempt to map the characters to the Windows code page.
If I put that string on a QLabel I see the correct characters (font permitting): the data made it in.
The qDebug() output in the console is still wrong because the QString is being mapped (again) to the local 8-bit code page with the same "?" result.
I get Cyrillic output in a CMD console with:
#include <QApplication>
#include <QLabel>
#include <QDebug>
#include <QTextCodec>
int main(int argc, char **argv)
{
QString test
= QString::fromUtf8("\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\x20\xD0\x9C\xD0\xB8\xD1\x80");
qDebug() << test.toLocal8Bit();
l.show();
return app.exec();
}
#include <QApplication>
#include <QLabel>
#include <QDebug>
#include <QTextCodec>
int main(int argc, char **argv)
{
QApplication app(argc, argv);
QTextCodec::setCodecForLocale(QTextCodec::codecForName("utf8"));
QString test = QString::fromUtf8("\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\x20\xD0\x9C\xD0\xB8\xD1\x80");
qDebug() << test.toLocal8Bit();
QLabel l(test);
l.show();
return app.exec();
}
To copy to clipboard, switch view to plain text mode
If:
- I run the program from a Windows CMD shell, and
- set the shell font to "Lucida Console", and
- I execute "chcp 65001" before I run the program.
Manually doing UTF8 encoding is not a good solution, and I do not yet have a nice solution.
There is a hotfix for VC 2010 http://stackoverflow.com/questions/6...er-arrays-in-c but it seems that did not make it into 2012 and 2013.
Bookmarks