Skip to content

Conversation

@ngopalak-redhat
Copy link
Contributor

@ngopalak-redhat ngopalak-redhat commented Nov 3, 2025

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (OCPNODE-3722: Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2025
@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/release-4.20-patch-autoconfig branch from 1229ca6 to 69555b0 Compare November 3, 2025 14:41
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ngopalak-redhat
Once this PR has been reviewed and has the lgtm label, please assign cheesesashimi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/release-4.20-patch-autoconfig branch from 2507d91 to e8e4d53 Compare November 7, 2025 04:08
@ngopalak-redhat ngopalak-redhat changed the title WIP: Patch 4.20 to ensure that KubeletConfig with AutoSizingReserved set t… Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@ngopalak-redhat
Copy link
Contributor Author

/test all

@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/release-4.20-patch-autoconfig branch from e8e4d53 to c9f9e79 Compare November 7, 2025 05:11
@ngopalak-redhat ngopalak-redhat changed the title Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default OCPNODE-3718: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 7, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set.

In response to this:

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat
Copy link
Contributor Author

/test unit

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set.

In response to this:

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat changed the title OCPNODE-3718: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue.

In response to this:

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue.

In response to this:

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat changed the title OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default [release-4.20] OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@ngopalak-redhat
Copy link
Contributor Author

/test bootstrap-unit

@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/release-4.20-patch-autoconfig branch from a958166 to b12addd Compare November 7, 2025 10:47
@ngopalak-redhat
Copy link
Contributor Author

/test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue.

In response to this:

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (OCPNODE-3722: Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat marked this pull request as ready for review November 7, 2025 12:46
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 7, 2025
@ngopalak-redhat
Copy link
Contributor Author

@haircommander @sairameshv Can you please review?


// configureAutoSizingMachineConfig creates an auto-sizing MachineConfig for a given pool if it doesn't exist
func (ctrl *Controller) configureAutoSizingMachineConfig(pool *mcfgv1.MachineConfigPool) error {
autoSizingKey := fmt.Sprintf("01-%s-auto-sizing-disabled", pool.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want the prio to be higher. maybe 50?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also: can we make a function of this? it's used twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion about increasing the priority to 50! I'm curious to understand the reasoning behind this - could you help me understand why priority 50 would be more appropriate than the current "01-" prefix? I want to make sure I'm following the right convention for this use case. Is there a specific ordering requirement or interaction with other MachineConfigs that I should be aware of?

Also I've refactored the repeated code by extracting the Ignition config creation logic into a helper function createAutoSizingIgnConfig().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run Levels 0 and 1 are usually reserved.

We also need to test the bootstrap phase - installing 4.20 with a kubelet config at install - then upgrading to 4.21.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rphillips . I can choose the middle ground as "50-". I tried searching for some docs that gives recommendations on these lines, but couldn't find any.
For testing the bootstrap phase, I have included the test results in the 4.21 PR: #5390

}

// newAutoSizingMachineConfig creates an empty auto-sizing MachineConfig for a given pool
func newAutoSizingMachineConfig(pool *mcfgv1.MachineConfigPool) (*mcfgv1.MachineConfig, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing it: I'm not seeing where we check if the user already set auto node sizing and skip making this MC if they did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comprehensive E2E tests in openshift/origin#30467 (Will submit for review)

Manual Testing on 4.20 Cluster:

  1. Verified MachineConfigs were created:
 $ oc get mc -o jsonpath='{.items[?(@.metadata.annotations.openshift-patch-reference == "added auto-sizing MachineConfig to disable node
 sizing")].metadata.name}'
 01-master-auto-sizing-disabled 01-worker-auto-sizing-disabled
  1. Verified default state (auto-sizing disabled):
 $ oc debug node/ip-10-0-45-86.ec2.internal
 sh-5.1# chroot /host
 sh-5.1# cat /etc/node-sizing-enabled.env
 NODE_SIZING_ENABLED=false
 SYSTEM_RESERVED_MEMORY=1Gi
 SYSTEM_RESERVED_CPU=500m
 SYSTEM_RESERVED_ES=1Gi
  1. Applied KubeletConfig to enable auto-sizing:
  apiVersion: machineconfiguration.openshift.io/v1
  kind: KubeletConfig
  metadata:
    name: 80-worker-auto
  spec:
    autoSizingReserved: true
    machineConfigPoolSelector:
      matchLabels:
        pools.operator.machineconfiguration.openshift.io/worker: ""
  1. Verified user configuration overrides the default (after node restart):
$ oc debug node/ip-10-0-45-86.ec2.internal
sh-5.1# chroot /host
sh-5.1# cat /etc/node-sizing-enabled.env
NODE_SIZING_ENABLED=true
SYSTEM_RESERVED_MEMORY=1Gi
SYSTEM_RESERVED_CPU=500m
SYSTEM_RESERVED_ES=1Gi

This confirms that:

  • The default MachineConfig correctly disables auto-sizing for 4.20 clusters
  • User-defined KubeletConfigs successfully override the default setting
  • The precedence order works as expected

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 9, 2025

@ngopalak-redhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 4d00657 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@sairameshv sairameshv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the pattern that you followed to add a default behavior via the creation of a MachineConfig object but I would strongly recommend to move this functionality to the KubeletConfigController instead of the KubeletNodeConfigController.
Although KubeletNodeConfigController deals with a few of the upstream kubelet config updates, it is not an ideal place to deal with the functionality related to the downstream KubeletConig API i.e. the AutoSizingReserved field


// createAutoSizingIgnConfig creates the Ignition config for auto-sizing disabled
func createAutoSizingIgnConfig() ([]byte, error) {
autoSizingFile := ctrlcommon.NewIgnFileBytes("/etc/node-sizing-enabled.env", []byte(DefaultAutoSizingEnvContent))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define the file "/etc/node-sizing-enabled.env" as a constant?

ctrl.nodeConfigQueue.AddAfter(key, 1*time.Minute)
}

// createAutoSizingIgnConfig creates the Ignition config for auto-sizing disabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// createAutoSizingIgnConfig creates the Ignition config for auto-sizing disabled
// createAutoSizingIgnConfig creates the Ignition config that would be written to a file containing the environment variables to enable/disable the `AutoSizingReserved` field of the KubeletConfig


// newAutoSizingMachineConfig creates an empty auto-sizing MachineConfig for a given pool
func newAutoSizingMachineConfig(pool *mcfgv1.MachineConfigPool) (*mcfgv1.MachineConfig, error) {
autoSizingDisabledName := fmt.Sprintf("01-%s-auto-sizing-disabled", pool.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just to improve the code readability

Suggested change
autoSizingDisabledName := fmt.Sprintf("01-%s-auto-sizing-disabled", pool.Name)
autoSizingDisabledMCName := fmt.Sprintf("01-%s-auto-sizing-disabled", pool.Name)

return rawAutoSizingIgn, nil
}

// newAutoSizingMachineConfig creates an empty auto-sizing MachineConfig for a given pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't create an empty MachineConfig, does it?

// newAutoSizingMachineConfig creates an empty auto-sizing MachineConfig for a given pool
func newAutoSizingMachineConfig(pool *mcfgv1.MachineConfigPool) (*mcfgv1.MachineConfig, error) {
autoSizingDisabledName := fmt.Sprintf("01-%s-auto-sizing-disabled", pool.Name)
ignConfig := ctrlcommon.NewIgnConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required as we any way create an ignitionconfig using createAutoSizingIgnConfig() and populate the machineconfig with it.
Ref: https://github.com/openshift/machine-config-operator/pull/5387/files#diff-f37515e1ebe0812972d628b0f81dd29fc25fd8289fbef268b1391a8a3273e0beR89

}

// configureAutoSizingMachineConfig creates an auto-sizing MachineConfig for a given pool if it doesn't exist
func (ctrl *Controller) configureAutoSizingMachineConfig(pool *mcfgv1.MachineConfigPool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't configuring any machineconfig here. AFAIU, this function helps in actual creation of the MC object. And newAutoSizingMachineConfig is a constructor to create a machine config object.
Could you update the method names/comments accordingly?


// configureAutoSizingMachineConfig creates an auto-sizing MachineConfig for a given pool if it doesn't exist
func (ctrl *Controller) configureAutoSizingMachineConfig(pool *mcfgv1.MachineConfigPool) error {
autoSizingKey := fmt.Sprintf("01-%s-auto-sizing-disabled", pool.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to define the name/prefix of the MC object as a constant

for _, pool := range mcpPools {
// First, create the auto-sizing MachineConfig for this pool if it doesn't exist
if err := ctrl.configureAutoSizingMachineConfig(pool); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this function syncNodeConfigHandler would always get hit as we have a nodes.config object in the cluster always
IMHO, the creation of this MC is not related to the nodes.config.openshift.io object in anyway and hence this is not the correct place to have this operation.
For ex, encountering an issue in creating an ignition file or a machineconfig should not interrupt the functionality of nodes.config api i.e. updating the workerLatencyProfiles or the minKubeletVersion etc.
The only place we should deal with is the kubeletConfigController. We can have an extra sync/work item which can be hit always without depending on the KubeletConfig object's presence.
Could you explore in that direction?


// Create auto-sizing MachineConfigs for each pool
for _, pool := range mcpPools {
autoSizingMC, err := newAutoSizingMachineConfig(pool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the place per this

require.Len(t, mcList.Items, 1)
require.NotEqual(t, nodeKeyCustom, mcList.Items[0].Name)
// Now expecting 2 MachineConfigs:
// 1. Auto-sizing MC for worker (01-worker-auto-sizing-disabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a nodes.config feature and we shouldn't be expecting an additional MC object here
Ref: https://github.com/openshift/machine-config-operator/pull/5387/files#r2517233985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants