String encodings

The C++ language does not specify encoding of strings. Thus, any char* array and any std::string object can use an arbitrary encoding. When using these types for interaction with native APIs and third-party libraries, you have to refer to their documentation to find out which encoding they use. The encoding used by native APIs of the operating system usually depends on the current locale. Third-party libraries often use the same encoding as native APIs, but some libraries may expect another encoding, for example, UTF-8.

A string literal (that is, each bare text you wrap in quotation marks) will use an implementation defined encoding. Since C++11, you have an option to specify the encoding your text will have:

  • u8"text" will produce a UTF-8 encoded const char[] array
  • u"text" will produce a UTF-16 encoded const char16_t[] array
  • U"text" will produce a UTF-32 encoded const char32_t[] array

Unfortunately, the encoding used for interpreting the source files is still implementation defined, so it's not safe to put non-ASCII symbols in string literals. You should use escape sequences (such as \unnnn) to write such literals.

Text in Qt is stored using the QString class that uses Unicode internally. Unicode allows us to represent characters in almost all languages spoken in the world and is the de facto standard for native encoding of text in most modern operating systems. There are multiple Unicode-based encodings. Memory representation of the content of QString resembles UTF-16 encoding. Basically, it consists of an array of 16-bit values where each Unicode character is represented by either 1 or 2 values.

When constructing a QString from a char array or an std::string object, it's important to use a proper conversion method that depends on the initial encoding of the text. By default, QString assumes UTF-8 encoding of the input text. UTF-8 is compatible with ASCII, so passing UTF-8 or ASCII-only text to QString(const char *str) is correct. QString provides a number of static methods to convert from other encodings such as QString::fromLatin1() or QString::fromUtf16(). QString::fromLocal8Bit() method assumes the encoding corresponding to the system locale.

If you have to combine both QString and std::string in one program, QString offers you the toStdString() and fromStdString() methods to perform a conversion. These methods also assume UTF-8 encoding of std::string, so you can't use them if your strings are in another encoding.

Default representation of string literals (for example, "text") is not UTF-16, so each time you convert it to a QString, an allocation and conversion happens. This overhead can be avoided using the QStringLiteral macro:

QString str = QStringLiteral("I'm writing my games using Qt"); 

QStringLiteral does two things:

  • It adds a u prefix to your string literal to ensure that it will be encoded in UTF-16 at compile time
  • It cheaply creates a QString and instructs it to use the literal without performing any allocation or encoding conversion

It's a good habit to wrap all your string literals (except the ones that need to be translated) into QStringLiteral but it is not required, so don't worry if you forget to do that.