Skip to content

typemax(Char) violates invariants of Char #59800

@jakobnissen

Description

@jakobnissen

While Chars can be invalid Unicode, they can never contain multiple codepoints (source: #44765 (comment)):

julia> '\xff\xff'
ERROR: ParseError:
# Error @ REPL[7]:1:2
'\xff\xff'
#└──────┘ ── character literal contains multiple characters

Note that such Chars cannot be constructed, even from malformed strings:

julia> first("\xff\xff")
'\xff': Malformed UTF-8 (category Ma: Malformed, bad data

However, typemax(Char) does not observe this, and gives a Char that is otherwise assumed to be unconstructable:

julia> typemax(Char)
'\xff\xff\xff\xff': Malformed UTF-8 (category Ma: Malformed, bad data)

I propose we document this once and for all - and if it's really an invariant that you can't have more than 1 codepoint in a Char, change the definition of typemax(Char).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions