-
Notifications
You must be signed in to change notification settings - Fork 35
Introducing Module-NMC controller #503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing Module-NMC controller #503
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yevgeny-shnaidman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kmm ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/assign @ybettan |
/assign @qbarrand |
cmd/manager/main.go
Outdated
mnc := controllers.NewModuleNMCReconciler( | ||
client, | ||
kernelAPI, | ||
nmcHandler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this one be renamed to nmcAPI
to be consistent with the rest of the API passed to it (or to the Module reconciler)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a controller, we are not creating/returning an Interface ( API), but the struct itself, as the rest of the reconcilers/controllers, that's why there is no API in the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nmcHandler
is a public interface. Why don't we call it nmcAPI
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the NMC-Handler API into the controller, since it implements controller logic
controllers/module_nmc_reconciler.go
Outdated
"sigs.k8s.io/controller-runtime/pkg/log" | ||
) | ||
|
||
//+kubebuilder:rbac:groups="core",resources=nodes,verbs=get;watch;patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to patch
nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need, removing
controllers/module_nmc_reconciler.go
Outdated
) | ||
|
||
//+kubebuilder:rbac:groups="core",resources=nodes,verbs=get;watch;patch | ||
//+kubebuilder:rbac:groups=kmm.sigs.x-k8s.io,resources=nodemodulesconfigs,verbs=get;list;watch;update;patch;create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between update
and patch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that depends on the action that you execute on the object. I prefer to mention both of them, so that we can execute them in the code without a need to update the rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the problem with updating the rules? A new version of the code will come with a new bundle anyway. I would restrict the verbs here to the bare minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the "update"
// get the reconciled module | ||
logger := log.FromContext(ctx) | ||
|
||
logger.Info("Starting Module-NMS reconcilation", "module name and namespace", req.NamespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who says that the reconciled object is a Module
? Can't it be a Node
? a Job
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this is how this reconciler is defined. It is defined For Module, and for every "watch" it finds the appropriate Module to reconcile
controllers/module_nmc_reconciler.go
Outdated
logger.Info("Module deleted") | ||
return ctrl.Result{}, nil | ||
} | ||
return ctrl.Result{}, fmt.Errorf("failed to get the requested %s KMMO CR: %w", req.NamespacedName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ctrl.Result{}, fmt.Errorf("failed to get the requested %s KMMO CR: %w", req.NamespacedName, err) | |
return ctrl.Result{}, fmt.Errorf("failed to get the requested %s KMMO CR: %v", req.NamespacedName, err) |
Are we using the wrapped error anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. We do use wrapped errors , when needed. For example in the KernelMapper, to get the appropriate MLD, if exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, "do we use the wrapped error in this case". If you fixed it I assume we were not using it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in this case it was a mistake
errs := make([]error, 0, len(nodes)) | ||
for _, node := range nodes { | ||
kernelVersion := strings.TrimSuffix(node.Status.NodeInfo.KernelVersion, "+") | ||
mld, err := mnr.kernelAPI.GetModuleLoaderDataForKernel(mod, kernelVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do that for every node? Can we extract it outside of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the kernel version of the nodes may differ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't see node
being passed to this function or the implementation depending on the nodes in any way. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the node is used to determine the kernelVersion , and to determine kernel mapping (using kernel version). Then both kernel mapping(mld) and node are passed to the function that determines whether the module needs to run on the node or not
} | ||
|
||
err = mnr.nmcHandler.HandleModuleNMC(ctx, mod, mld, node) | ||
errs = append(errs, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you appending nil
errors here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but errors.Join knows how to handle them and ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it be clearer to only append real errors instead of delegating it to Join
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why to add another "if", if there is already code in the Join that takes care of it? This is the purpose of Join function: decide of there are errors in the array of errors , and create the appropriate error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why to add another "if", if there is already code in the Join that takes care of it?
Because it is not trivial for someone who didn't read the Join
doc to understand those are ignored. Everywhere else in the code we are checking if an error is != nil
before doing something with it except here.
If you think it is better this way, let's keep it, I just though it is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am keeping it: if we have a Golang library function in the code, i would expect that developer to read about it and understand
getRequestedModule(ctx context.Context, namespacedName types.NamespacedName) (*kmmv1beta1.Module, error) | ||
getNodesList(ctx context.Context) ([]v1.Node, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why getRequestedModule
isn't a method in the module
package and why we don't have a node
package that will hold getNodeList
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Slack, i am leaving it here and we can decide on it and the rest of functions in packages in v2.1
} | ||
} | ||
|
||
func (mnrh *moduleNMCReconcilerHelper) getRequestedModule(ctx context.Context, namespacedName types.NamespacedName) (*kmmv1beta1.Module, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this function a dup of getRequestedModule
in the module_reconciler.go
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Slack, i am leaving it here and we can decide on it and the rest of functions in packages in v2.1
return &mod, nil | ||
} | ||
|
||
func (mnrh *moduleNMCReconcilerHelper) getNodesList(ctx context.Context) ([]v1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this function exist (almost) in module_reconciler.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Slack, i am leaving it here and we can decide on it and the rest of functions in packages in v2.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we need this function at all? In tests, we replace one EXPECT()
with another - not sure what is the benefit. I guess that remark is also cvalid for getRequestedModule
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to set such function in a helper interface, makes it easier to write the unit-tests for the main reconciliation loop, and to check all the corner case in a dedicated unit-test for that specific function.
d65402f
to
b1c21c6
Compare
controllers/module_nmc_reconciler.go
Outdated
) | ||
|
||
//+kubebuilder:rbac:groups="core",resources=nodes,verbs=get;watch;patch | ||
//+kubebuilder:rbac:groups=kmm.sigs.x-k8s.io,resources=nodemodulesconfigs,verbs=get;list;watch;update;patch;create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the problem with updating the rules? A new version of the code will come with a new bundle anyway. I would restrict the verbs here to the bare minimum.
return &mod, nil | ||
} | ||
|
||
func (mnrh *moduleNMCReconcilerHelper) getNodesList(ctx context.Context) ([]v1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we need this function at all? In tests, we replace one EXPECT()
with another - not sure what is the benefit. I guess that remark is also cvalid for getRequestedModule
above.
b1c21c6
to
2b37582
Compare
/test pull-kernel-module-management-unit-tests |
/test pull-kernel-module-management-unit-tests |
c6e518a
to
09f9ade
Compare
controllers/module_nmc_reconciler.go
Outdated
Watches( | ||
&batchv1.Job{}, | ||
handler.EnqueueRequestForOwner(mgr.GetScheme(), mgr.GetRESTMapper(), &kmmv1beta1.Module{}, handler.OnlyControllerOwner()), | ||
builder.WithPredicates( | ||
filter.ModuleNMCReconcileJobPredicate(), | ||
), | ||
). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use Owns
instead?
Watches( | |
&batchv1.Job{}, | |
handler.EnqueueRequestForOwner(mgr.GetScheme(), mgr.GetRESTMapper(), &kmmv1beta1.Module{}, handler.OnlyControllerOwner()), | |
builder.WithPredicates( | |
filter.ModuleNMCReconcileJobPredicate(), | |
), | |
). | |
Owns( | |
&batchv1.Job{}, | |
builder.WithPredicates( | |
filter.ModuleNMCReconcileJobPredicate(), | |
), | |
). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Owns defines it own filter and handler. Since i would like to keep the filter, i am still using Watches
ebc5e47
to
fcb00c9
Compare
/lgtm |
This PR introduces a new controller, that is responsible for reconciliation of NMC based on the data in Module and Node The following are the main points: 1) It watches for changes in Modules, Nodes (labels, becoming schedulable) and Jobs belonging owned by Modules (in order to reschedule once Build/Sign job are finished) 2) Per reconciled Module, go over all the nodes in the cluster and reconcile their NMCs based on the Module/Node state Changes in the PR: 1) adding functions to filter package that are resposible for filtering the events (nodes. modules, jobs) and finding out which modules should be reconciled 2) changing the NMC handler logic a little, adding verification for container image presence in case module should be enable. 3) Module-NMC Reconciliation is being done both in case Kernel Mapping exists and in case it does not exists ( solves the garbage collection flow) This PR does not include Module status updates based on NMC updates
fcb00c9
to
70b755e
Compare
/lgtm |
Build using Go 1.18. Upstream-Commit: 4c76f4d
…igs#694) This PR introduces a new controller, that is responsible for reconciliation of NMC based on the data in Module and Node The following are the main points: 1) It watches for changes in Modules, Nodes (labels, becoming schedulable) and Jobs belonging owned by Modules (in order to reschedule once Build/Sign job are finished) 2) Per reconciled Module, go over all the nodes in the cluster and reconcile their NMCs based on the Module/Node state Changes in the PR: 1) adding functions to filter package that are resposible for filtering the events (nodes. modules, jobs) and finding out which modules should be reconciled 2) changing the NMC handler logic a little, adding verification for container image presence in case module should be enable. 3) Module-NMC Reconciliation is being done both in case Kernel Mapping exists and in case it does not exists ( solves the garbage collection flow) This PR does not include Module status updates based on NMC updates
This PR introduces a new controller, that is responsible for reconciliation of NMC based on the data in Module and Node The following are the main points:
schedulable) and Jobs belonging owned by Modules (in order to
reschedule once Build/Sign job are finished)
and reconcile their NMCs based on the Module/Node state
Changes in the PR:
the events (nodes. modules, jobs) and finding out which modules
should be reconciled
container image presence in case module should be enable.
exists and in case it does not exists ( solves the garbage
collection flow)
This PR does not include Module status updates based on NMC updates