-
Notifications
You must be signed in to change notification settings - Fork 58
Description
This was the image upload failing on Firefox/macOS bug that @david-crespo was running into.
I've looked into it some more and from what I can tell, at some point the browser gets stalled while uploading some chunks.
On the console side we split up the file into 384KiB chunks which we try to upload 6 at a time (to not hit browser concurrency limits). It doesn't happen every time, but every so often there will be a chunk or two where it seems like the browser made the request but there's no response. (At least from the browser Web Developer Tools Network tab).
I changed the console side to add a query parameter on each individual chunk upload and for the stalled chunks I saw no mention of such a request in the Nexus logs. Next I tried a packet capture with Wireshark and after setting up SSLKEYLOGFILE
(because this will only repro with https; we'll get back to that) I did see the browser making those requests. And in fact, I could see it sending until at some point it stops with still no response.
After not being able to repro without TLS and then seeing the packet cap, I realized we're using HTTP/2 for compatible clients. The browser is maintaining a single connection and using multiple HTTP/2 streams to make the different requests. Ok, so is something else blocking our image upload somehow? Cue some more reading about HTTP/2 and it definitely has a concept of flow control that each peer maintains separately.
Basically, for each side, there's a connection level flow control window size as well as a per-stream size. Every byte sent decrements the available bytes in the window. If the sender has exhausted the window size, they must not send anymore until a WINDOW_UPDATE
is received from the peer that tells it there's more space.
I need to look into it some more but it seems like the browser might think its exhausted the window but there's no window update from the nexus size. Hacked in a tracing
subscriber to see anything useful from hyper and I do see mentions of the stalled streams but not super familier with hyper enough to decode them yet. (hyperium/hyper#2899 seems relevant maybe?)