Skip to content

Uri - encoding of host should support domain punycoding #60664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
theatischbein opened this issue May 2, 2025 · 2 comments
Open

Uri - encoding of host should support domain punycoding #60664

theatischbein opened this issue May 2, 2025 · 2 comments
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug

Comments

@theatischbein
Copy link

When parsing an URL as Uri currently dart does not support encoding the domain punycoded, which means certain domains are not reachable without importing third party packages.

Example

  • given URL https://my-äöü-domain.com/endpoint/1234/
  • currently parsed as https://my-%C3%A4%C3%B6%C3%BC-domain.com/endpoint/1234/
  • expected https://xn--my--domain-s5a0tyc.com/endpoint/1234/
void main() {
  final url = "https://my-äöü-domain.com/endpoint/1234/";
  print(url);
  Uri uri = Uri.parse(url);
  print(uri);
}

Real world problem in flutter: https://gitlab.com/TheOneWithTheBraid/dart_pkpass/-/merge_requests/7

Punycode

  • is used to encode Internationalized Domain Names in Applications (IDNA)
  • is a special encoding used to convert Unicode characters to ASCII, which is a smaller, restricted character set.
  • defined in RFC 3492 and RFC 5891

Workarounds

Solution

  • this should be part of the core package / Uri class
  • merge solution like punycode_converter into into Uri -> host

My System

❯ dart info

#### General info

- Dart 3.5.4 (stable) (Wed Oct 16 16:18:51 2024 +0000) on "linux_x64"
- on linux / Linux 6.14.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 26 Apr 2025 00:06:37 +0000
- locale is en_GB.UTF-8

#### Process info

| Memory |  CPU | Elapsed time | Command line                        |
| -----: | ---: | -----------: | ----------------------------------- |
| 385 MB | 0.7% |        20:47 | dart language-server --protocol=lsp |
@lrhn
Copy link
Member

lrhn commented May 3, 2025

Sadly a reasonable request. (It's a horrible encoding, but it's compatible with ASCII based DNS, so ... here we are.)

The WhatWG URL spec also supports it, so if we ever manage to migrate the Url class to that, the feature won't go away.
It even UTF-8 decodes %-encoded input and puny-codes it back.

Should probably only apply to designated protocols (http, https, ws, wss). Not all hosts are DNS hosts.

It would be "normalization", which is one-way, so toString and host gives the puny-code encoded string. Not great for displaying. But since the percent-encoded host won't actually work, it's probably not something people depend on toady.

@lrhn lrhn self-assigned this May 3, 2025
@lrhn lrhn added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug labels May 3, 2025
@mark-dropbear
Copy link

mark-dropbear commented May 4, 2025

For whatever it's worth, I JUST got done implementing Punycode and released it 2 days ago as a package here: https://pub.dev/packages/punycoder where I tried to follow the dart:convert Codec interface so it should feel fairly familiar.

For example:

// Designed to be used with domains and emails which have special rules
const domainCodec = PunycodeCodec();
// Designed to work with simple strings
const simpleCodec = PunycodeCodec.simple();

final encodedString = simpleCodec.encode('münchen');
final encodedDomain = domainCodec.encode('münchen.com');
final encodedEmail = domainCodec.encode('münchen@münchen.com');

print(encodedString); // Output: mnchen-3ya
print(encodedDomain); // Output: xn--mnchen-3ya.com
// Only the domain should be encoded
print(encodedEmail); // Output: mü[email protected]

final decodedString = simpleCodec.decode('mnchen-3ya');
final decodecDomain = domainCodec.decode('xn--mnchen-3ya.com');
final decodedEmail = domainCodec.decode('mü[email protected]');

print(decodedString); // Output: münchen
print(decodecDomain); // Output: münchen.com
print(decodedEmail); // Output: münchen@münchen.com

and perhaps also of interest.. I am in the middle of trying to implement the IRI - Internationalized Resource Identifiers RFC 3987 specification too but that is a lot messier and is something I would DEARLY love to see supported officially. But it may be a good fit for what you are after as well. I used the Punycode to handle converting between one another like so:

// NOTE: Not actually released, hopefully this week. Waiting on the pub.dev team to approve the name
import 'package:iri/iri.dart';

void main() {
  // 1. Create an IRI from a string containing non-ASCII characters.
  //  The path contains 'ȧ' (U+0227 LATIN SMALL LETTER A WITH DOT ABOVE).
  final iri = IRI('https://例子.com/pȧth?q=1');

  // 2. Print the original IRI string representation.
  print('Original IRI: $iri');
  // Output: Original IRI: https://例子.com/pȧth?q=1

  // 3. Convert the IRI to its standard URI representation.
  //    - The host (例子.com) is converted to Punycode (xn--fsqu00a.com).
  //    - The non-ASCII path character 'ȧ' (UTF-8 bytes C8 A7) is percent-encoded (%C8%A7).
  final uri = iri.toUri();
  print('Converted URI: $uri');
  // Output: Converted URI: https://xn--fsqu00a.com/p%C8%A7th?q=1

  // 4. Access components (values are decoded for IRI representation).
  print('Scheme: ${iri.scheme}');       // Output: Scheme: https
  print('Host: ${iri.host}');           // Output: Host: 例子.com
  print('Path: ${iri.path}');           // Output: Path: /pȧth
  print('Query: ${iri.query}');         // Output: Query: q=1

  // 5. Compare IRIs
  final iri2 = IRI('https://例子.com/pȧth?q=1');
  print('IRIs equal: ${iri == iri2}'); // Output: IRIs equal: true

  final iri3 = IRI('https://example.com/');
  print('IRIs equal: ${iri == iri3}'); // Output: IRIs equal: false
}

You can find that here in the meantime: https://github.com/dropbear-software/iri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

3 participants