Rules for `char8_t` et al.

Bit by bit, the C++ standard is gaining more support for Unicode, and the `charN_t` family of types is gaining relevance. We should have some rules that guide the usage of the types.

A good starting point would be something like
> **Don't mix different character types in expressions**

Mixing character types is often wrong, even when no narrowing occurs. Consider the following example:
```cpp
bool contains_oe(std::u8string_view str) {
    for (char8_t c : str)
        if (c == U'ö') // comparison always fails
            return true;
    return false;
}
```
The comparison always fails because `ö` is UTF-8-encoded as `0xC3 0xB6`, so even if `str` contains a `u8"ö"` somewhere, you wouldn't be able to find it this way.

There are certain instances where mixing character types is safe; for instance `u8'x' == U'x'` is `true`. However, safe use of this property requires the developer to memorize the set of ASCII characters.

Mixing `char`, `wchar_t`, and other character types in expressions is generally bug-prone because it's encoding-dependent. Treating `char` as `char8_t` may be safe if `char` is UTF-8 anyway, but that's far from universally true.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rules for `char8_t` et al. #2275

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rules for char8_t et al. #2275

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Rules for `char8_t` et al. #2275