Multi-phase preparer workflow #234

smashwilson · 2016-03-25T13:17:08Z

I've been talking about doing this for a while, but I haven't actually documented the full idea anywhere yet. This is what I want to do with the way that preparers work:

A preparer container (preparer-jekyll, preparer-sphinx) mounts the workspace into a volume. It's responsible for processing its input directory (CONTENT_ROOT or /usr/content-repo), writing each envelope to a url-encoded-content-id.json file in an ENVELOPE_DIR, and copying each asset to an ASSET_DIR.
A submitter container mounts the volume. It's also provided with the CONTENT_STORE_URL and CONTENT_STORE_APIKEY. It submits all of the assets contained in ASSET_DIR to the content store, then submits all of the envelopes from ENVELOPE_DIR. For bonus points and mad performance, it should do this in two HTTP transactions.

This lets us:

Not duplicate and maintain the content store submission protocol across N preparers; instead, each preparer can write to disk, which is closer to what the native engines do, anyway.
Build small, reusable audit containers that can (mostly) be used independently of the preparer.
Further isolate the exposure of the content store API key - the submitter is trusted deconst code and never loads user-submitted content at all. The preparer (which often does things like exec the conf.py file) no longer needs an API key at all.
Have much faster builds (I hope) by consolidating network transport into two HTTP transactions rather than a linear sequence of one per envelope and asset.

Here's my rough checklist:

Follow-on issues:

We're still submitting unnecessary envelopes and assets. Investigate resubmitted envelopes submitter#11
The Strider build plugin doesn't handle the submitter's exit status. Handle exit status from the submitter strider-deconst-content#26
Prefer assets from upstream in staging mode. In staging mode, prefer assets from upstream in /checkassets content-service#103

The text was updated successfully, but these errors were encountered:

kenperkins · 2016-03-25T15:05:45Z

It sounds like you're not planning on addressing (at least through this issue) any kind of differential asset upload. Is that correct?

smashwilson · 2016-03-25T15:08:07Z

Not initially, but this'll get us closer. Once we have bulk upload for envelopes and assets, we can add a handshaking request, where the submitter offers a set of checksums to see what it can leave out of the uploads.

smashwilson · 2016-03-25T15:08:33Z

I'm trying to keep this issue from becoming even more sprawling.

smashwilson · 2016-04-07T13:24:30Z

So, uh, now I am doing asset and envelope fingerprinting as part of this after all. Sprawl++

etoews · 2016-04-07T13:28:23Z

Is there a use case for eTags here?

smashwilson · 2016-04-07T13:34:44Z

Is there a use case for eTags here?

I don't think so, because our upload requests are performed in bulk. Part of my agenda is to accomplish a content repository publish with as few transactions as possible:

The submitter has a full set of assets and envelopes on disk. It fingerprints them and queries the content store API to ask what's new all at once.
The content store compares the fingerprints to the fingerprints of the latest resources. It returns asset URLs for assets that are already present and yes-or-no responses for envelopes.
The submitter prepares a tarball containing all new assets and uploads it to /bulkassets. The response contains the new asset URLs for those assets.
The submitter injects those URLs into the metadata envelopes.
The submitter prepares a tarball containing all new envelopes and uploads it to /bulkcontent.

ETags are more of a request-by-request sort of thing, and even if you could attach more than one to a single request, they wouldn't prevent the submitter from needing to prepare and POST this giant tarball anyway. Unless I'm misremembering how they work, of course.

etoews · 2016-04-07T13:45:12Z

Still had my head in the individual content envelope model so I was thinking request-by-request. nm

smashwilson · 2016-04-26T20:32:17Z

I've split the doctest work into its own issue at deconst/strider-deconst-content#25, because I'm close to shipping bulk differential uploads without it. It'll still be a pretty natural extension.

smashwilson · 2016-04-27T12:19:46Z

Here's a full build of the how-to repository, even with buggy duplicate asset and envelope detection:

🐎 🐎 🐎

smashwilson · 2016-04-29T14:20:18Z

🤘 This is now live and working.

smashwilson self-assigned this Mar 25, 2016

smashwilson added the ready label Mar 30, 2016

smashwilson added in progress and removed ready labels Apr 6, 2016

smashwilson mentioned this issue Apr 7, 2016

Bulk asset uploads deconst/content-service#95

Closed

This was referenced Apr 8, 2016

Submit content atomically #133

Closed

Trigger PR build only on updates to specific directories deconst/strider-deconst-content#22

Closed

smashwilson mentioned this issue Apr 18, 2016

Adhere to the new preparer protocol deconst/preparer-jekyll#35

Merged

smashwilson closed this as completed Apr 29, 2016

smashwilson removed the in progress label Apr 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-phase preparer workflow #234

Multi-phase preparer workflow #234

smashwilson commented Mar 25, 2016 •

edited

Loading

kenperkins commented Mar 25, 2016

Uh oh!

smashwilson commented Mar 25, 2016

Uh oh!

smashwilson commented Mar 25, 2016

Uh oh!

smashwilson commented Apr 7, 2016

Uh oh!

etoews commented Apr 7, 2016

Uh oh!

smashwilson commented Apr 7, 2016

Uh oh!

etoews commented Apr 7, 2016

Uh oh!

smashwilson commented Apr 26, 2016

Uh oh!

smashwilson commented Apr 27, 2016

Uh oh!

smashwilson commented Apr 29, 2016

Uh oh!

Multi-phase preparer workflow #234

Multi-phase preparer workflow #234

Comments

smashwilson commented Mar 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kenperkins commented Mar 25, 2016

Uh oh!

smashwilson commented Mar 25, 2016

Uh oh!

smashwilson commented Mar 25, 2016

Uh oh!

smashwilson commented Apr 7, 2016

Uh oh!

etoews commented Apr 7, 2016

Uh oh!

smashwilson commented Apr 7, 2016

Uh oh!

etoews commented Apr 7, 2016

Uh oh!

smashwilson commented Apr 26, 2016

Uh oh!

smashwilson commented Apr 27, 2016

Uh oh!

smashwilson commented Apr 29, 2016

Uh oh!

smashwilson commented Mar 25, 2016 •

edited

Loading