You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bit by bit, the C++ standard is gaining more support for Unicode, and the charN_t family of types is gaining relevance. We should have some rules that guide the usage of the types.
A good starting point would be something like
Don't mix different character types in expressions
Mixing character types is often wrong, even when no narrowing occurs. Consider the following example:
boolcontains_oe(std::u8string_view str) {
for (char8_t c : str)
if (c == U'ö') // comparison always failsreturntrue;
returnfalse;
}
The comparison always fails because ö is UTF-8-encoded as 0xC3 0xB6, so even if str contains a u8"ö" somewhere, you wouldn't be able to find it this way.
There are certain instances where mixing character types is safe; for instance u8'x' == U'x' is true. However, safe use of this property requires the developer to memorize the set of ASCII characters.
Mixing char, wchar_t, and other character types in expressions is generally bug-prone because it's encoding-dependent. Treating char as char8_t may be safe if char is UTF-8 anyway, but that's far from universally true.
The text was updated successfully, but these errors were encountered:
Editors call: We agree there could be room for such a rule, but as you point out C++ is still in the process of gaining Unicode support, and the Guidelines are for established best practices with known features. We should pursue this again once the standard finishes adding enough Unicode support including basics like printing and formatting of char8_t.
Bit by bit, the C++ standard is gaining more support for Unicode, and the
charN_t
family of types is gaining relevance. We should have some rules that guide the usage of the types.A good starting point would be something like
Mixing character types is often wrong, even when no narrowing occurs. Consider the following example:
The comparison always fails because
ö
is UTF-8-encoded as0xC3 0xB6
, so even ifstr
contains au8"ö"
somewhere, you wouldn't be able to find it this way.There are certain instances where mixing character types is safe; for instance
u8'x' == U'x'
istrue
. However, safe use of this property requires the developer to memorize the set of ASCII characters.Mixing
char
,wchar_t
, and other character types in expressions is generally bug-prone because it's encoding-dependent. Treatingchar
aschar8_t
may be safe ifchar
is UTF-8 anyway, but that's far from universally true.The text was updated successfully, but these errors were encountered: