Skip to content

fix: decode UTF-16LE shell output on Windows #1456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 29, 2025
Merged

Conversation

tsmithsz
Copy link
Contributor

@tsmithsz tsmithsz commented May 28, 2025

Problem

On Windows we launch shell commands through cmd.exe /c. cmd.exe writes legacy code‑page bytes.
Because our ExecuteBash tool always assumed UTF‑8, any non‑ASCII characters in the output were being mis‑decoded and displayed as ??. This affected customers

Solution

  • Invoke cmd with /u on Windows: Forces UTF‑16 LE encoding (this is how Windows handles Unicode) for all output written to the process pipes, independent of the console code‑page.
  • Heuristic decoder:
    • Re‑creates the raw bytes from the received string (Buffer.from(text, 'binary')).
    • Detects UTF‑16 LE by checking whether every odd byte in the first 32 bytes is 0x00.
    • Decodes with buffer.toString('utf16le') when the pattern matches, otherwise falls back to UTF‑8.

Testing

Before Fix:

Screenshot 2025-05-28 at 2 17 34 AM

After Fix:

Screenshot 2025-05-28 at 2 15 53 AM

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tsmithsz tsmithsz requested a review from a team as a code owner May 28, 2025 09:08
@tsmithsz tsmithsz requested a review from rli May 29, 2025 00:09
@tsmithsz tsmithsz merged commit ae48442 into aws:main May 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants