Langues where strings are primarily UTF-8

I'm trying to figure out how languages that primarily use UTF-8 for their strings would use this proposal.

The first example that comes to mind is Rust, however a Rust `String` (which exists in linear memory) can be coerced to `&str` and so neither type can be transparently a `stringref`. So you'd need to either: (1) rewrite code to use a `WasmString` type or (2) copy on the boundary into linear memory. (2) isn't really different from what we have today from what I can tell.

Thinking about (1), I'm skeptical that code is going to rewrite to use it, but assuming that they do I'm not sure how it would utilize this proposal.

My best guess is they'd: 
  1. Represent WasmString as `stringref` so that you can use string.eq/concat
  2. Get a `stringview_wtf8` whenever an accessor is called (like indexing)

The concern I have with this is that SpiderMonkey wouldn't be able to store WTF8 contents inside our stringref for the medium-term future. So every single accessor call, like indexing would force a transcode from the stringref to the view. 

Maybe you could make WasmString cache the wtf8_view lazily so it can re-use the view from a previous accessor call? But then strings would have twice the memory overhead.

Am I missing something? I also would be interested in other languages, but my mind is coming up blank.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Langues where strings are primarily UTF-8 #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Langues where strings are primarily UTF-8 #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions