Skip to content

Multi-phase preparer workflow #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
16 tasks done
smashwilson opened this issue Mar 25, 2016 · 10 comments
Closed
16 tasks done

Multi-phase preparer workflow #234

smashwilson opened this issue Mar 25, 2016 · 10 comments
Assignees

Comments

@smashwilson
Copy link
Member

smashwilson commented Mar 25, 2016

I've been talking about doing this for a while, but I haven't actually documented the full idea anywhere yet. This is what I want to do with the way that preparers work:

  1. A preparer container (preparer-jekyll, preparer-sphinx) mounts the workspace into a volume. It's responsible for processing its input directory (CONTENT_ROOT or /usr/content-repo), writing each envelope to a url-encoded-content-id.json file in an ENVELOPE_DIR, and copying each asset to an ASSET_DIR.
  2. A submitter container mounts the volume. It's also provided with the CONTENT_STORE_URL and CONTENT_STORE_APIKEY. It submits all of the assets contained in ASSET_DIR to the content store, then submits all of the envelopes from ENVELOPE_DIR. For bonus points and mad performance, it should do this in two HTTP transactions.

This lets us:

  • Not duplicate and maintain the content store submission protocol across N preparers; instead, each preparer can write to disk, which is closer to what the native engines do, anyway.
  • Build small, reusable audit containers that can (mostly) be used independently of the preparer.
  • Further isolate the exposure of the content store API key - the submitter is trusted deconst code and never loads user-submitted content at all. The preparer (which often does things like exec the conf.py file) no longer needs an API key at all.
  • Have much faster builds (I hope) by consolidating network transport into two HTTP transactions rather than a linear sequence of one per envelope and asset.

Here's my rough checklist:

  • content-service | Add API endpoints to the content store to accept batch envelope uploads as a tarball. (Submit content atomically #133)
  • content-service | Bulk asset uploads as a tarball. (Bulk asset uploads content-service#95)
  • content-service | Fingerprint and query fingerprints for uploaded envelopes and assets. (Support diff uploads content-service#92)
  • preparer-sphinx | Modify the Sphinx preparer to follow the new preparer protocol (in such a way that won't break existing builds!).
  • preparer-jekyll | Modify the Jekyll preparer to follow the new preparer protocol (in such a way that won't break existing builds!).
  • submitter | Create the submitter and package it as a container. Use the new environment variable naming conventions from Rename "CONTENT_STORE_" variables #101.
  • strider-deconst-content | Rework the preparation workflow to run the new sequence of containers rather than a single preparer container.
  • deconst-docs | 📝 the new build workflow and preparer container protocol. In the writer's section, cover the sequence of steps to rename content and put redirects in place.
  • :shipit: to deconst.horse
  • :shipit: to Nexus
  • integrated | Update the preparer scripts to run the new preparer flow.
  • preparer-sphinx | 🔪 cruft left over from before.
  • preparer-jekyll | 🔪 cruft left over from before.

Follow-on issues:

@smashwilson smashwilson self-assigned this Mar 25, 2016
@kenperkins
Copy link
Contributor

It sounds like you're not planning on addressing (at least through this issue) any kind of differential asset upload. Is that correct?

@smashwilson
Copy link
Member Author

Not initially, but this'll get us closer. Once we have bulk upload for envelopes and assets, we can add a handshaking request, where the submitter offers a set of checksums to see what it can leave out of the uploads.

@smashwilson
Copy link
Member Author

I'm trying to keep this issue from becoming even more sprawling.

@smashwilson
Copy link
Member Author

So, uh, now I am doing asset and envelope fingerprinting as part of this after all. Sprawl++

@etoews
Copy link
Contributor

etoews commented Apr 7, 2016

Is there a use case for eTags here?

@smashwilson
Copy link
Member Author

Is there a use case for eTags here?

I don't think so, because our upload requests are performed in bulk. Part of my agenda is to accomplish a content repository publish with as few transactions as possible:

  1. The submitter has a full set of assets and envelopes on disk. It fingerprints them and queries the content store API to ask what's new all at once.
  2. The content store compares the fingerprints to the fingerprints of the latest resources. It returns asset URLs for assets that are already present and yes-or-no responses for envelopes.
  3. The submitter prepares a tarball containing all new assets and uploads it to /bulkassets. The response contains the new asset URLs for those assets.
  4. The submitter injects those URLs into the metadata envelopes.
  5. The submitter prepares a tarball containing all new envelopes and uploads it to /bulkcontent.

ETags are more of a request-by-request sort of thing, and even if you could attach more than one to a single request, they wouldn't prevent the submitter from needing to prepare and POST this giant tarball anyway. Unless I'm misremembering how they work, of course.

@etoews
Copy link
Contributor

etoews commented Apr 7, 2016

Still had my head in the individual content envelope model so I was thinking request-by-request. nm

@smashwilson
Copy link
Member Author

I've split the doctest work into its own issue at deconst/strider-deconst-content#25, because I'm close to shipping bulk differential uploads without it. It'll still be a pretty natural extension.

@smashwilson
Copy link
Member Author

Here's a full build of the how-to repository, even with buggy duplicate asset and envelope detection:

screen shot 2016-04-27 at 8 18 03 am

🐎 🐎 🐎

@smashwilson
Copy link
Member Author

🤘 This is now live and working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants