Skip to content

Rules for char8_t et al. #2275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Eisenwave opened this issue May 3, 2025 · 1 comment
Closed

Rules for char8_t et al. #2275

Eisenwave opened this issue May 3, 2025 · 1 comment

Comments

@Eisenwave
Copy link
Contributor

Bit by bit, the C++ standard is gaining more support for Unicode, and the charN_t family of types is gaining relevance. We should have some rules that guide the usage of the types.

A good starting point would be something like

Don't mix different character types in expressions

Mixing character types is often wrong, even when no narrowing occurs. Consider the following example:

bool contains_oe(std::u8string_view str) {
    for (char8_t c : str)
        if (c == U'ö') // comparison always fails
            return true;
    return false;
}

The comparison always fails because ö is UTF-8-encoded as 0xC3 0xB6, so even if str contains a u8"ö" somewhere, you wouldn't be able to find it this way.

There are certain instances where mixing character types is safe; for instance u8'x' == U'x' is true. However, safe use of this property requires the developer to memorize the set of ASCII characters.

Mixing char, wchar_t, and other character types in expressions is generally bug-prone because it's encoding-dependent. Treating char as char8_t may be safe if char is UTF-8 anyway, but that's far from universally true.

@hsutter
Copy link
Contributor

hsutter commented May 8, 2025

Editors call: We agree there could be room for such a rule, but as you point out C++ is still in the process of gaining Unicode support, and the Guidelines are for established best practices with known features. We should pursue this again once the standard finishes adding enough Unicode support including basics like printing and formatting of char8_t.

@hsutter hsutter closed this as completed May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants