-
Notifications
You must be signed in to change notification settings - Fork 105
Support external Slurm DBD #2690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support external Slurm DBD #2690
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #2690 +/- ##
========================================
Coverage 76.48% 76.48%
========================================
Files 22 22
Lines 2220 2220
========================================
Hits 1698 1698
Misses 522 522
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
cookbooks/aws-parallelcluster-entrypoints/recipes/external_slurmdbd_config.rb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-entrypoints/recipes/external_slurmdbd_config.rb
Outdated
Show resolved
Hide resolved
...llelcluster-slurm/files/default/head_node_slurm/slurm/templates/slurm_external_slurmdbd.conf
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-slurm/recipes/config/config_slurm_accounting.rb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-slurm/recipes/config/config_external_slurmdbd_s3_mountpoint.rb
Outdated
Show resolved
Hide resolved
f9dac1b
to
a026897
Compare
# This was tested on Ubuntu 20.04, Alinux2 and Rocky8 | ||
# Others OS should be checked and added | ||
|
||
serviceList = %w() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this to a resource rather than a recipe would be better in the longer run.
With a stop and Disable actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this can be an improvement so that it doesnt block the current PR
LGTM |
@@ -24,6 +32,15 @@ | |||
action :create_if_missing | |||
end | |||
|
|||
template "#{node['cluster']['slurm']['install_dir']}/etc/slurm_external_slurmdbd.conf" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt this require a change in Kitchen tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can we add more Rspec Tests for the recipies we added for external Surmdbd
…te update_munge_key script, add config_munge_key recipe. Use dbms_password_secret_arn and munge_key_secret_arn instead of digging from config when external slurmdbd. Add TODO list
…rb to use new conf when configing external_slurmdbd.
Fix Chef::Exceptions::FileNotFound error. Add logic of create some necessary directories.
Add new Chef template to generate an include file for the slurmdbd.conf only in case of external slurmdbd configuration. Minor reformatting of slurmdbd.conf.erb. Signed-off-by: Jacopo De Amicis <[email protected]>
The external slurmdbd instance cannot use Slurm client commands since it does not have a "full configuration source" (it's outside a Slurm cluster). This must be investigated with SchedMD. Signed-off-by: Jacopo De Amicis <[email protected]>
Use ExternalSlurmdbd to render AccountingStorageHost when defined. Add unit test case to cover external slurmdbd scenario. Adapt other unit tests for Slurm configuration rendering when Slurm Accounting is used. --- Signed-off-by: Jacopo De Amicis <[email protected]>
Remove `NoAddrCache` parameter (this should be useless after a rebase). Signed-off-by: Jacopo De Amicis <[email protected]>
Export the IP into the instance dna.json. Add minor comments to the code. Signed-off-by: Jacopo De Amicis <[email protected]>
Add a dedicated NodeType `ExternalSlurmDbd` that is required by the recipe logic to trigger the right behavior. Signed-off-by: Nicola Sirena <[email protected]>
Install mountpoint-s3 and mount the S3 bucket of the external slurmdbd stack on `/opt/slurm/etc`. The slurmdbd configuration files are already generated from a template with a `:create_if_missing` action, meaning that if the files are already present in the bucket they will not be overwritten. Signed-off-by: Jacopo De Amicis <[email protected]>
Remove previously implemented logic to configur s3 mountpoint for the configuration of the external slurmdbd Signed-off-by: Jacopo De Amicis <[email protected]>
ParallelCluster AMI are configured for Desktop use and so have some active service that is not required and are certainly unused like { apache2, cups, ..., wpa_supplicant } Signed-off-by: Nicola Sirena <[email protected]>
Do not configure DbdPort in slurmdbd.conf in case of external slurmdbd. Move configuration of slurmdbd.service into config_slurm_accounting.rb (this is also called by the external slurmdbd recipe). Skip some steps in config_slurm_accounting.rb in kitchen tests. Signed-off-by: Jacopo De Amicis <[email protected]>
This allows to have a single action to download an item from S3 and assign it an owner, a group or a permission mode, without the need to specify a separate `file` action, which could generate an empty file in case the intended remote file does not exist. Signed-off-by: Jacopo De Amicis <[email protected]>
Adapt External Slurmdbd configuration on the PC cluster side to the Host and Port subparameter of the ExternalSlurmdbd schema. Adapt unit tests accordingly. Signed-off-by: Jacopo De Amicis <[email protected]>
Add ExternalSlurmdbd subsection in unit tests for the generation of the Slurm configuration. Signed-off-by: Jacopo De Amicis <[email protected]>
1. Directly referencing node attributes 2. Do not add key to "shared folders" on SlurmDBD instances Signed-off-by: Hanwen <[email protected]>
Signed-off-by: Hanwen <[email protected]>
The node attributes can be used in cookbook template directly. Signed-off-by: Hanwen <[email protected]>
Signed-off-by: Hanwen <[email protected]>
231185c
to
f412cb9
Compare
Signed-off-by: Hanwen <[email protected]>
f412cb9
to
7f824af
Compare
Signed-off-by: Hanwen <[email protected]>
7f824af
to
ba81080
Compare
Description of changes
Tests
References
Checklist
develop
add the branch name as prefix in the PR title (e.g.[release-3.6]
).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.