Skip to content

Implement Debug for EncodeWide #140153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thaliaarchi
Copy link
Contributor

Since std::os::windows::ffi::EncodeWide is reexported from std::sys_common::wtf8::EncodeWide, which has #![allow(missing_debug_implementations)] in the parent module, it does not implement Debug. Implement it akin to core::str::Chars.

The only question is how to format each WTF-16 code unit. We can't format it like char, because \u escape sequences for surrogate halves are invalid syntax in Rust. It would be nice to format them as hex integers, but I am not aware of other instances of that pattern for Debug in std. Settle for decimal.

This becomes insta-stable.

r? libs-api

Since `std::os::windows::ffi::EncodeWide` is reexported from
`std::sys_common::wtf8::EncodeWide`, which has
`#![allow(missing_debug_implementations)]` in the parent module, it does
not implement `Debug`. Implement it akin to `core::str::Chars`.

The only question is how to format each WTF-16 code unit. We can't
format it like `char`, because \u escape sequences for surrogate halves
are invalid syntax in Rust. It would be nice to format them as hex
integers, but I am not aware of other instances of that pattern for
`Debug` in `std`. Settle for decimal.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 22, 2025
@bjorn3
Copy link
Member

bjorn3 commented Apr 23, 2025

We can't format it like char, because \u escape sequences for surrogate halves are invalid syntax in Rust.

Even if rust doesn't accept \u for surrogate halves, we can still format them as \u. Debug isn't guaranteed to be valid rust source code anyway.

@thaliaarchi
Copy link
Contributor Author

thaliaarchi commented Apr 23, 2025

Here are the options, as I see them. I think formatting them as chars when not surrogates is the most readable for most texts. Rust style seems to lean towards uppercase hex.

  1. EncodeWide([97, 233, 32, 55357, 55357, 56489]): decimal
  2. EncodeWide([0x61, 0xE9, 0x20, 0xD83D, 0xD83D, 0xDCA9]): hex, zero-pad min 2, upper
  3. EncodeWide([0x61, 0xe9, 0x20, 0xd83d, 0xd83d, 0xdca9]): hex, zero-pad min 2, lower
  4. EncodeWide([0x0061, 0x00E9, 0x0020, 0xD83D, 0xD83D, 0xDCA9]): hex, zero-pad min 4, upper
  5. EncodeWide([0x0061, 0x00e9, 0x0020, 0xd83d, 0xd83d, 0xdca9]): hex, zero-pad min 4, lower
  6. EncodeWide(['a', 'é', ' ', '\u{D83D}', '\u{D83D}', '\u{DCA9}']): pseudo-char, escaped surrogates, upper
  7. EncodeWide(['a', 'é', ' ', '\u{d83d}', '\u{d83d}', '\u{dca9}']): pseudo-char, escaped surrogates, lower

@tgross35 tgross35 added needs-fcp This change is insta-stable, so needs a completed FCP to proceed. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants