Skip to content

Conversation

@jchunkins
Copy link
Contributor

This is an alternate proposal for catalog switching that uses annotations.


The mechanism for declaring an image using templates involves an annotation called
`olm.catalogImageTemplate` whose value consists of the image reference with one or more templates included.
A controller (which can exist in either the OLM catalog operator or within a standalone operator) is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to understand the interaction of the 2 controller approach here, when I initially apply a catalogsource yaml file to the cluster, what happens? I assume the existing catalog operator controller "sees" the new resource and starts to spin up the associated pod using the original image in the spec, the new controller also sees the new resource and modifies the same resource with new image? Is there a coordination problem here?

I am also trying to understand the switching part, if we are looking at checking the kube version, when is this actually checked? Is it when the operator actually restarts and sees a new kube version or is there some sort of polling/cron job kinda thing in play here? I guess if we are updating the kube version of the cluster it is assumed that operator gets restarted during the upgrade?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-read Evans comment from the other PR, he did mention polling, so in addition to watching from catalogsources, there is another thread that does the polling and if it finds changes, it then pulls in all the existing catalogsources and applies the new polling results if different?

Also, on the topic of where this controller lives, my vote would be as part of the OLM catalog operator, may be harder to integrate but if its in OLM then customers don't have to do take addition actions to install it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we are updating the kube version of the cluster it is assumed that operator gets restarted during the upgrade?

In general, I don't think this is a safe assumption to make. OLM itself cannot make any guarantees that some other external system will upgrade it when the cluster apiserver upgrades.

It seems like we'll need to poll the apiserver for its version information to handle the case where the catalog operator pod survives a cluster upgrade.

One possible non-polling solution would be to trigger a new version info request whenever the operator's connection(s) to the apiserver is disconnected. Not totally sure this would work though (e.g. is it possible for there to be a proxy between the operator and the apiserver pod that might keep a connection open even across apiserver restarts?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to store the resolved template as well, so we can quickly determine if we need to rerun the templater.

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: dynamic-catalog
  namespace: olm
  annotations:
    olm.catalogImageTemplate: "quay.io/kube-release-v{kube_major_version}/catalog:v{kube_major_version}.{kube_minor_version}"
...
status:
  olm.catalogImageTemplate:
    messages:
    resolvedImage: "quay.io/sample{kube_major_version}/catalog:{group:foo.example.com,version:v1,kind:Sample,name:MySample,namespace:ns,jsonpath:spec.foo.bar}"  
    resolvedCatalogImageTemplate: "quay.io/kube-release-v{kube_major_version}/catalog:v{kube_major_version}.{kube_minor_version}"  

So the CatalogSource Switcher Controller has these flows:

New Annotated CatalogSource

  • CatalogSource Event
    • Has metadata.annotations["olm.catalogImageTemplate"]? Yes
      • Is valid? Yes
      • Any kube* variables? Yes
        • Start poller to watch the API Server for version changes (implementation TBD)
      • Any GVK variables? Yes
        • Start watch for these specific resources.
      • Has changed? (Resolved template <> Annotation template?) Yes (it's new)
      • Update image reference
      • Update status
        - Store reconciled status

Updated Annotated CatalogSource

  • CatalogSource Event
    • Has metadata.annotations["olm.catalogImageTemplate"]? Yes
      • Is valid? Yes
      • Has changed? (Resolved template <> Annotation template?) Yes.
      • Update image reference
      • Update status
        - Store reconciled status

Deleted Annotated CatalogSource

  • CatalogSource Event
    • Has olmmetadata.annotations["olm.catalogImageTemplate"]? No.
    • Has `status["olm.catalogImageTemplate"].resolvedCatalogImageTemplate? Yes. Deleted
      • Remove resolvedCatalogImageTemplate from status
      • Update status message to indicate that the template is no longer considered.
      • Stop poller and watcher (store state in a configmap)

Updated Kubernetes Version
Opportunity for caching and optimization here

  • Notification of version change: 1.18.1 to 1.18.2
    • Iterate through each CatalogSource and re-process each template.

Updated Template Target Resource
Opportunity for caching and optimization here

  • Notification of resource update. (Watch Event)
    • Iterate through each CatalogSource and re-process each template.

Delete Template Target Resource
Opportunity for caching and optimization here

  • Notification of resource update. (Watch Event)
    • Iterate through each CatalogSource and re-process each template.
      • Target resource is now gone, so the template can no longer be updated.

Start controller

  • Check state configmap (pollers, watchers)
  • Reestablish polling and watchers.
  • Iterate through all CatalogSources to verify no lost updates/deletions
  • Verify the kube version
  • Verify all template target resource state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two conditions where we get an opportunity to evaluate the templates:

  1. During sync

    My assumption is that we would be creating a new controller that runs in the catalog operator. When that controller is setup, we'd pass in the sync duration (using whatever value we decide on) and then every time the sync process kicks in we can evaluate the templates. See the catalog operator CRD controller setup for a conceptual example. Note that all of the controllers use the same sync period (which is configurable) but that does not mean we have to use the exact same period (or even allow configuration of this period).

  2. Event processing:

    We'd also be able to evaluate the templates whenever a watched resource changes.

name: dynamic-catalog
namespace: olm
annotations:
olm.catalogImageTemplate: "quay.io/sample/catalog:{group:foo.example.com,version:v1,kind:Sample,name:MySample,namespace:ns,jsonpath:spec.foo.bar}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see any reason to optimize for referencing variables from THIS CatalogSource? Like the downward API?

If we wanted to support that, then you could dispense with group, version, kind, name, namespace and only have jsonpath. i.e. jsonpath by itself, references THIS resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but I don't think there's anything in the catalog source spec to use as a template value that would be of interest. I suppose you could reference information under metadata or status. This could be something we add later if we find a need.

@huizengaJoe
Copy link

I am new to this process but I would like to know how we decide if this controller lives within the catalog operator or standalone, as I mentioned above, I think consumers would get the best experience if it is packaged with OLM verses being just another operator in an OLM catalog. I think it also drives a different development cycle and process. Thoughts?

@cdjohnson
Copy link

We discussed these options last Thursday at the olm-dev call:

  1. Enclose it in the existing OLM Catalog Controller
  2. Create a sister controller as part of the same OLM Catalog Operator
  3. Create a new independent Operator

I think folks were interested in option 2, as it allows decoupling yet reduces the packaing overhead of creating another operator, which 3rd parties would need to know to consume and enable.

- make the status section use conditions and provide examples of this
@kevinrizza
Copy link
Member

/approve

@kevinrizza kevinrizza merged commit d7b3110 into operator-framework:master Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants