Skip to content

PowerFlex/ScaleIO - MDM and host SDC connection enhancements #11047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Jun 17, 2025

Description

This PR enhances the PowerFlex/ScaleIO MDM and host SDC connections, includes the following changes (and some code improvements).

  • Introduced timeout configuration 'powerflex.mdm.change.apply.wait' (Default value: 1000 ms) at zone scope, to wait after MDM addition, and before & after MDM removal changes made on Host with ScaleIO SDC. Also, Changes to apply the wait time after making MDM changes for ScaleIO in prepare and unprepare logic.

  • Introduced configuration flag 'powerflex.block.sdc.unprepare' (Default is false) at zone scope, to enable/disable blocking unprepare ScaleIO SDC connection when SDC client restart required (upon PowerFlex MDM removal i.e. no support for --remove_mdm in drv_cfg cmd) and there are volumes attached to the Host. Added validation to fail Host disconnect from Storage Pool if there are Volumes attached and SDC client MDM removal requires scini service to be restarted.

  • Introduced configuration flag 'powerflex.mdm.validate.on.connect' (Default is false) at zone scope, to enable/disable validation of MDM addresses on Host, in the Configuration File and in CLI cmd (drv_cfg --query_mdms) output matches or not, during storage pool registration in agent.

  • Added detection of MDM removal support via CLI. If MDM removal support via CLI supported then use CLI, Otherwise fall back to edit drv_cfg.txt and restart scini as earlier. Tested with /opt/emc/scaleio/sdc/bin/drv_cfg --version: DellEMC PowerFlex Version: R3_6.4000.124, with cmd: /opt/emc/scaleio/sdc/bin/drv_cfg --remove_mdm.

  • Added agent property 'powerflex.sdc.service.wait' for the time (in secs) to wait after SDC service start/restart/stop, and retries to fetch SDC id/guid.

  • Updated to allow unprepare SDC when there are no volumes mapped on the host for other connected pools (with same SDC Id, i.e pools of same PowerFlex storage cluster).

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Tested the new settings, and PowerFlex SDC connections (MDM add/remove) with VM & Volume operations.

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 26.13982% with 243 lines in your changes missing coverage. Please review.

Project coverage is 16.58%. Comparing base (be22bfe) to head (ef3c3f1).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...cloudstack/storage/datastore/util/ScaleIOUtil.java 28.85% 99 Missing and 7 partials ⚠️
.../hypervisor/kvm/storage/ScaleIOStorageAdaptor.java 25.00% 57 Missing and 12 partials ⚠️
...re/lifecycle/ScaleIOPrimaryDataStoreLifeCycle.java 14.81% 21 Missing and 2 partials ⚠️
...orage/datastore/manager/ScaleIOSDCManagerImpl.java 0.00% 21 Missing ⚠️
...torage/datastore/provider/ScaleIOHostListener.java 0.00% 12 Missing ⚠️
...s/src/main/java/com/cloud/utils/script/Script.java 0.00% 10 Missing ⚠️
...rapper/LibvirtModifyStoragePoolCommandWrapper.java 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               main   #11047    +/-   ##
==========================================
  Coverage     16.57%   16.58%            
- Complexity    13971    13989    +18     
==========================================
  Files          5743     5745     +2     
  Lines        510648   511093   +445     
  Branches      62105    62170    +65     
==========================================
+ Hits          84641    84753   +112     
- Misses       416534   416854   +320     
- Partials       9473     9486    +13     
Flag Coverage Δ
uitests 3.91% <ø> (+<0.01%) ⬆️
unittests 17.47% <26.13%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland DaanHoogland added this to the 4.21.0 milestone Jun 17, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the handling of ScaleIO MDM and host SDC connections by introducing new configuration keys for timeouts, validation, and blocking behavior along with updating the underlying command execution and logging mechanisms.

  • Introduced new configuration keys, including MdmsChangeApplyTimeout, ValidateMdmsOnConnect, and BlockSdcUnprepareIfRestartNeededAndVolumesAreAttached.
  • Updated MDM add/remove logic to use varargs and improved command execution via Script.executeCommand.
  • Adjusted test cases and adapter logic to account for the new ScaleIO configurations.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
utils/src/main/java/com/cloud/utils/script/Script.java Added a new executeCommand(String) method for command execution including stdout/stderr handling.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/util/ScaleIOUtil.java Refactored MDM add/remove methods and updated command templates, patterns, and file read operations.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/manager/ScaleIOSDCManagerImpl.java Updated configuration details sent to hosts by including new timeout settings.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/manager/ScaleIOSDCManager.java Introduced new config key definitions and updated the getConfigKeys() return values.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/lifecycle/ScaleIOPrimaryDataStoreLifeCycle.java Injected the new configuration details during maintain and cancellation procedures.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/client/ScaleIOGatewayClient.java Added a short documentation comment for STORAGE_POOL_MDMS.
plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/storage/ScaleIOStorageAdaptorTest.java Updated test mocks to reflect changes in command executions for MDM removal.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/ScaleIOStorageAdaptor.java Introduced validation of MDM state and timeout application after MDM changes.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtModifyStoragePoolCommandWrapper.java Wrapped storage pool creation in a try/catch block to better handle CloudRuntimeException.
Comments suppressed due to low confidence (1)

plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/util/ScaleIOUtil.java:315

  • [nitpick] The ordering of stdout and stderr here is reversed compared to other usages of Script.executeCommand; consider using a consistent ordering (stdout as first, stderr as second) to avoid confusion.
String stdErr = result.first(); String stdOut = result.second();

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 13811

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13812

Copy link
Contributor

@harikrishna-patnala harikrishna-patnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-13562)

@harikrishna-patnala
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13586)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 69826 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11047-t13586-kvm-ol8.zip
Smoke tests completed. 136 look OK, 5 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestAccounts>:setup Error 0.00 test_accounts.py
ContextSuite context=TestAddVmToSubDomain>:setup Error 0.00 test_accounts.py
test_DeleteDomain Error 15.92 test_accounts.py
test_forceDeleteDomain Failure 15.66 test_accounts.py
ContextSuite context=TestRemoveUserFromAccount>:setup Error 16.38 test_accounts.py
ContextSuite context=TestTemplateHierarchy>:setup Error 1537.55 test_accounts.py
ContextSuite context=TestDeployVmWithAffinityGroup>:setup Error 0.00 test_affinity_groups_projects.py
ContextSuite context=TestAnnotations>:setup Error 0.00 test_annotations.py
ContextSuite context=TestAsyncJob>:setup Error 0.00 test_async_job.py
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13907

@rohityadavcloud
Copy link
Member

rohityadavcloud commented Jun 26, 2025

@sureshanaparti as these are general bugfix/improvements for powerflex storage, should these be also applicable for 4.20 branch (esp these changes are scaleio/powerflex storage plugin related with no DB changes)?

@sureshanaparti sureshanaparti changed the title PowerFlex/ScaleIO MDM and host SDC connection enhancements PowerFlex/ScaleIO - MDM and host SDC connection enhancements Jun 27, 2025
@sureshanaparti
Copy link
Contributor Author

@sureshanaparti as these are general bugfix/improvements for powerflex storage, should these be also applicable for 4.20 branch (esp these changes are scaleio/powerflex storage plugin related with no DB changes)?

@rohityadavcloud There are no DB changes, but changes in host SDC connection behavior earlier in this PR: #9903 (already part of main), where the SDC connection is controlled using MDMs addition/removal instead of SDC service scini start/stop. This PR changes are on top of these, and introduces some configurations to validate the MDMs and apply timeout after add/remove MDMs. I think, it's better to keep this new behavior with main itself.
A minor improvement (of keeping wait time after SDC service start/restart/stop, and retries to fetch SDC id/guid) is applicable for the old behavior as well, so can go in 4.20 branch. I've raised a separate PR for it here: #11099

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

mprokopchuk and others added 17 commits July 3, 2025 15:48
…/unprepare, validate Storage Pool can be created in Agent.

- Implemented validation to fail Host disconnect from Storage Pool if there are Volumes attached and SDC client MDM removal requires scini service to be restarted
- Implemented Storage Pool validation by checking whether MDM addresses from configuration file and from memory (using CLI) matches, otherwise file ModifyStoragePool command.
- Introduced configuration key to apply timeout after making MDM changes for ScaleIO: powerflex.mdm.change.apply.timeout.ms (default 1000ms)
- Implemented logic to apply timeout after making MDM changes for ScaleIO in prepare and unprepare logic
- Added detection of MDM removal support via CLI
- If MDM removal support via CLI supported then use CLI, fall back to edit drv_cfg.txt and restart scini instead
… pool when SDC client restart required and there are volumes attached to the Host
…tack/storage/datastore/manager/ScaleIOSDCManager.java

Co-authored-by: Suresh Kumar Anaparti <[email protected]>
…tack/storage/datastore/manager/ScaleIOSDCManager.java

Co-authored-by: Suresh Kumar Anaparti <[email protected]>
…tack/storage/datastore/util/ScaleIOUtil.java

Co-authored-by: Suresh Kumar Anaparti <[email protected]>
…y_guid, --rescan in drv_cfg cmd as it is not supported. --file parameter is supported with --add_mdm, --mod_mdm_ip, --remove_mdm, --set_guid, --set_mdm_password, --reset_mdm_password
…cs) to wait after SDC service start/restart/stop
…other connected pools (with same SDC Id, i.e pools of same PowerFlex storage cluster)
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 14008

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14013

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14072

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants