Skip to content

Support external Slurm DBD #2690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hanwen-pcluste
Copy link
Contributor

Description of changes

  • Describe what you're changing and why you're doing these changes.

Tests

  • Describe the automated and/or manual tests executed to validate the patch.
  • Describe the added/modified tests.

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.48%. Comparing base (7d414fe) to head (a026897).
Report is 5 commits behind head on develop.

❗ Current head a026897 differs from pull request most recent head ba81080. Consider uploading reports for the commit ba81080 to get more accurate results

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2690   +/-   ##
========================================
  Coverage    76.48%   76.48%           
========================================
  Files           22       22           
  Lines         2220     2220           
========================================
  Hits          1698     1698           
  Misses         522      522           
Flag Coverage Δ
unittests 76.48% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hanwen-pcluste hanwen-pcluste marked this pull request as ready for review April 1, 2024 13:55
@hanwen-pcluste hanwen-pcluste requested review from a team as code owners April 1, 2024 13:55
@hanwen-pcluste hanwen-pcluste force-pushed the wip/feature/external_slurmdbd branch 2 times, most recently from f9dac1b to a026897 Compare April 3, 2024 19:25
# This was tested on Ubuntu 20.04, Alinux2 and Rocky8
# Others OS should be checked and added

serviceList = %w()
Copy link
Contributor

@himani2411 himani2411 Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to a resource rather than a recipe would be better in the longer run.

With a stop and Disable actions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this can be an improvement so that it doesnt block the current PR

@himani2411
Copy link
Contributor

LGTM

@@ -24,6 +32,15 @@
action :create_if_missing
end

template "#{node['cluster']['slurm']['install_dir']}/etc/slurm_external_slurmdbd.conf" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt this require a change in Kitchen tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can we add more Rspec Tests for the recipies we added for external Surmdbd

hehe7318 and others added 16 commits April 16, 2024 10:08
…te update_munge_key script, add config_munge_key recipe. Use dbms_password_secret_arn and munge_key_secret_arn instead of digging from config when external slurmdbd. Add TODO list
…rb to use new conf when configing external_slurmdbd.
Fix Chef::Exceptions::FileNotFound error.

Add logic of create some necessary directories.
Add new Chef template to generate an include file for the
slurmdbd.conf only in case of external slurmdbd configuration.

Minor reformatting of slurmdbd.conf.erb.

Signed-off-by: Jacopo De Amicis <[email protected]>
The external slurmdbd instance cannot use Slurm client commands
since it does not have a "full configuration source" (it's
outside a Slurm cluster).

This must be investigated with SchedMD.

Signed-off-by: Jacopo De Amicis <[email protected]>
Use ExternalSlurmdbd to render AccountingStorageHost when defined.

Add unit test case to cover external slurmdbd scenario.

Adapt other unit tests for Slurm configuration rendering when Slurm
Accounting is used.

---

Signed-off-by: Jacopo De Amicis <[email protected]>
Remove `NoAddrCache` parameter (this should be useless after
a rebase).

Signed-off-by: Jacopo De Amicis <[email protected]>
Export the IP into the instance dna.json.

Add minor comments to the code.

Signed-off-by: Jacopo De Amicis <[email protected]>
Add a dedicated NodeType `ExternalSlurmDbd` that is required
by the recipe logic to trigger the right behavior.

Signed-off-by: Nicola Sirena <[email protected]>
Install mountpoint-s3 and mount the S3 bucket of the external
slurmdbd stack on `/opt/slurm/etc`.

The slurmdbd configuration files are already generated from
a template with a `:create_if_missing` action, meaning that
if the files are already present in the bucket they will not
be overwritten.

Signed-off-by: Jacopo De Amicis <[email protected]>
Remove previously implemented logic to configur s3 mountpoint
for the configuration of the external slurmdbd

Signed-off-by: Jacopo De Amicis <[email protected]>
ParallelCluster AMI are configured for Desktop use and so have some
active service that is not required and are certainly unused
like { apache2, cups, ...,  wpa_supplicant }

Signed-off-by: Nicola Sirena <[email protected]>
Do not configure DbdPort in slurmdbd.conf in case of external slurmdbd.

Move configuration of slurmdbd.service into config_slurm_accounting.rb (this is
also called by the external slurmdbd recipe).

Skip some steps in config_slurm_accounting.rb in kitchen tests.

Signed-off-by: Jacopo De Amicis <[email protected]>
This allows to have a single action to download an item from S3
and assign it an owner, a group or a permission mode, without the
need to specify a separate `file` action, which could generate an
empty file in case the intended remote file does not exist.

Signed-off-by: Jacopo De Amicis <[email protected]>
jdeamicis and others added 6 commits April 16, 2024 10:08
Adapt External Slurmdbd configuration on the PC cluster side
to the Host and Port subparameter of the ExternalSlurmdbd
schema.

Adapt unit tests accordingly.

Signed-off-by: Jacopo De Amicis <[email protected]>
Add ExternalSlurmdbd subsection in unit tests for the
generation of the Slurm configuration.

Signed-off-by: Jacopo De Amicis <[email protected]>
1. Directly referencing node attributes
2. Do not add key to "shared folders" on SlurmDBD instances

Signed-off-by: Hanwen <[email protected]>
The node attributes can be used in cookbook template directly.

Signed-off-by: Hanwen <[email protected]>
@hanwen-pcluste hanwen-pcluste force-pushed the wip/feature/external_slurmdbd branch 3 times, most recently from 231185c to f412cb9 Compare April 16, 2024 19:51
@hanwen-pcluste hanwen-pcluste force-pushed the wip/feature/external_slurmdbd branch from f412cb9 to 7f824af Compare April 16, 2024 19:56
@hanwen-pcluste hanwen-pcluste force-pushed the wip/feature/external_slurmdbd branch from 7f824af to ba81080 Compare April 16, 2024 20:29
@hanwen-pcluste hanwen-pcluste enabled auto-merge (rebase) April 16, 2024 20:56
@hanwen-pcluste hanwen-pcluste merged commit 5b46d05 into aws:develop Apr 16, 2024
38 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants