Description
I'm trying to figure out how languages that primarily use UTF-8 for their strings would use this proposal.
The first example that comes to mind is Rust, however a Rust String
(which exists in linear memory) can be coerced to &str
and so neither type can be transparently a stringref
. So you'd need to either: (1) rewrite code to use a WasmString
type or (2) copy on the boundary into linear memory. (2) isn't really different from what we have today from what I can tell.
Thinking about (1), I'm skeptical that code is going to rewrite to use it, but assuming that they do I'm not sure how it would utilize this proposal.
My best guess is they'd:
- Represent WasmString as
stringref
so that you can use string.eq/concat - Get a
stringview_wtf8
whenever an accessor is called (like indexing)
The concern I have with this is that SpiderMonkey wouldn't be able to store WTF8 contents inside our stringref for the medium-term future. So every single accessor call, like indexing would force a transcode from the stringref to the view.
Maybe you could make WasmString cache the wtf8_view lazily so it can re-use the view from a previous accessor call? But then strings would have twice the memory overhead.
Am I missing something? I also would be interested in other languages, but my mind is coming up blank.