Skip to content

Define Indexed DB as a storage endpoint, use hooks #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

inexorabletash
Copy link
Member

@inexorabletash inexorabletash commented May 15, 2020

WORK IN PROGRESS - not ready to merge

For whatwg/storage#90

Bikeshed hasn't picked up the new terms from Storage yet.

There should be no behavior changes here.


Preview | Diff


Preview | Diff

@annevk
Copy link
Member

annevk commented May 17, 2020

Looking at this made me notice the "connection queue", which should probably use the storage key as well rather than the origin? Is this a primitive that should move to the Storage Standard?

Also, part of the idea was that you'd no longer need "If origin is an opaque origin" as the Storage Standard would take care of that (and return failure as appropriate).

@inexorabletash
Copy link
Member Author

Looking at this made me notice the "connection queue", which should probably use the storage key as well rather than the origin? Is this a primitive that should move to the Storage Standard?

Agreed it should be decoupled from origin (that's a good mental filter to use in general, thanks). Is it a generic enough concept to move to Storage? Other thoughts are to monkey-patch it on to storage bottle, or make it part of the bottle's contents (e.g. the bottle could have a "connection queue" key and a "databases" key, where the latter's value a map of the actual databases).

Also, part of the idea was that you'd no longer need "If origin is an opaque origin" as the Storage Standard would take care of that (and return failure as appropriate).

Agreed, I'll roll that in. Since that error is synchronous in IDB (well, in 2/3 cases), I'll have to rework the algorithms a bit but I think it's fine.

@inexorabletash
Copy link
Member Author

Hmmm, actually there is a connection queue per name. So the bottle's map values can be a pair of (queue, database)....? Still thinking this through.

@inexorabletash
Copy link
Member Author

And also note that we have no idea what should happen to pending open/delete requests if storage was swapped out. Are they associated with the previous storage? (so the queue is part of storage itself) Or do they apply to whatever the current storage is when they run? Brain hurts...

@asutherland
Copy link
Collaborator

And also note that we have no idea what should happen to pending open/delete requests if storage was swapped out. Are they associated with the previous storage? (so the queue is part of storage itself) Or do they apply to whatever the current storage is when they run? Brain hurts...

I think it makes sense for them to be associated with the previous storage.

Step 2 of the proposed replace algorithm at whatwg/storage#18 (comment) is a task that runs on the given agent. It makes sense that the execution of this task would constitute the start of the new storage epoch, if you will. Requests made in the before times would be irrelevant.

@inexorabletash
Copy link
Member Author

Cool. A few thoughts on how to structure the bottle map...

  • name → (queue, database)
  • namedatabase, and make queue a property of database, and make databases never be deleted, only cleared/reset.
  • ("queue", name) → queue and ("database", name) → database

Pros and cons for each. Preferences? Other ideas?

@asutherland
Copy link
Collaborator

* _name_ → (_queue_, _database_)

I like this one because:

  • Having the key be the name seems like a big win over the compound key approach of ("queue", name).
  • Separating the queue from the database seems like a nice separation of concerns, especially since the database is something that definitely involves touching disk and is conceptually subject to corruption. Whereas the connection queue is strictly a runtime concept.

Blobs / Files

A related question is how IndexedDB-minted Blobs and Files will handle the replace operation and whether this impacts the map. Gecko definitely invalidates IndexedDB-minted Blobs and Files when Clear-Site-Data and privacy data-clearing operations occur. From other discussions in the past I have the impression this is also the case in Blink.

The File API Spec doesn't really get into this in the section on deserialization and the get stream algorithm which implies a simplified model where no effort is made to de-duplicate Blob contents or store them to disk, but does leave implementation a broad latitude to throw errors when get stream is invoked to compensate for the underlying realities.

It seems like we might want to formalize the realities of Blobs/Files now since Clear-Site-Data makes this previous edge-case something content can explicitly trigger instead of a user-initiated edge-case, plus multiple storage buckets presumably would also want to be able to dispose of the underlying blobs and their quota usage in a deterministic fashion.

Doing this might involve a hook where get stream could end up needing to involve some part of the storage hierarchy, in which case it's possible the map might need to store additional data to support this.

@mkruisselbrink
Copy link

Blobs / Files

FWIW, I'm working on clarifying this part of the FileAPI spec, with my current thinking being to let others (i.e. IndexedDB, other APIs that produce blobs) define a get stream hook, and then defining all the operations on blobs in terms of that. Unfortunately haven't had as much time to work on that as I would have liked, but it is among the higher priority of spec things I'm working on, also to better define how things work for the Native File System API.

So yes, in that model it would be totally up to IndexedDB to define when/how these blobs get invalidated.

annevk added a commit to whatwg/storage that referenced this pull request May 20, 2020
"legacy-clone a browsing session storage shed" can be used by HTML to define creation of auxiliary browsing contexts, as part of whatwg/html#5560.

"obtain a storage key" can be used by APIs that share keying logic with storage, such as BroadcastChannel and shared workers. See whatwg/html#3054. It's potentially also useful for Indexed DB as discussed in w3c/IndexedDB#334.

Closes #92.
annevk added a commit to whatwg/storage that referenced this pull request Jun 5, 2020
"legacy-clone a browsing session storage shed" can be used by HTML to define creation of auxiliary browsing contexts, as part of whatwg/html#5560.

"obtain a storage key" can be used by APIs that share keying logic with storage, such as BroadcastChannel and shared workers. See whatwg/html#3054. It's potentially also useful for Indexed DB as discussed in w3c/IndexedDB#334.

Also helps a bit with #95 by reorganizing and adding some more detail to how a user agent is supposed to manage storage.

Closes #92.
@inexorabletash inexorabletash changed the base branch from master to main July 23, 2020 16:18
@inexorabletash inexorabletash force-pushed the endpoint branch 2 times, most recently from e62ed35 to c16bb29 Compare June 7, 2021 17:15
@inexorabletash
Copy link
Member Author

Partial update. I needed a name for the (queue,database) struct that exists in the map. I literally called it pumpkin here as a placeholder name because I wasn't feeling inspired. So bikeshed away!

This drops the need for most of the imports from Storage, although these are retained:

  • storage bucket - which exists in the published version, so not new in this PR; this is used to reference durability
  • storage identifier - this is informative, used in the phrase Indexed DB is a storage endpoint, with the storage identifier "indexedDB". which isn't really necessary but looks nice? We can remove, since it's a registered end point and we don't need to define it. Thoughts?

Most of the remaining references to "origin" end up being fairly illustrative rather than normative definitions. We could probably scrub most of them e.g. "if the origin’s storage is cleared" → "if the storage is cleared".

inexorabletash added a commit that referenced this pull request Jun 8, 2022
Mostly a "find and replace" of "origin" with "storage key" right now. 

More detailed integration will is being worked on in #334

Co-authored-by: Joshua Bell <[email protected]>
@inexorabletash inexorabletash force-pushed the endpoint branch 4 times, most recently from 8c7cc83 to 1042cec Compare June 8, 2022 22:07
@inexorabletash
Copy link
Member Author

See also: whatwg/storage#153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants