-
Notifications
You must be signed in to change notification settings - Fork 6
Slurm Commander using slurm 23.02.0-0rc1 #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, |
A slight correction from my side on the "some polishing" estimate i gave yesterday is in order. First impression: Conclusion: support for slurm 23.x might take a while, if it ever happens. Will keep this issue open and report any progress i might have here. If anyone reading this issue is interested in helping out, by code or advice, please do not refrain from doing so 😃 |
Took a quick look at the way As for porting to the new 23.02 formatting. I noticed that I suggest against grabbing the openapi.json from inside of the Slurm source directly but instead use Hopefully, this helps with your porting efforts as it would be a shame to have this project get abandoned. I also suggest taking a look at the new outputs now provided in Slurm-23.02 as more information tabs may be possible. |
Gentlemen, Thank you very much for your support and this effort to help.
The only rationale behind using commands instead of slurmrestd was the reach of scom. Does this make sense from your perspective? I'd love to hear your opinion on this.
You're right about
Could you elaborate a bit on the relation between rest endpoints and commands like scontrol example above? Is there a 1:1 mapping between them?
Tried this now with 23, will continue to do so.
Also the rest of your technical clarifications and suggestions were brilliant, i'll try to align the development accordingly. Summary: I've started work on porting, bellow is a more technical description of some ideas and problems i'm struggling to achieve, probably will hit some roadblocks here and there and in that case i'll take the liberty to ask for more assistance if that is ok with you? Problems and ideas: Problem 1: fetching data: cmd vs rest Try:
Problem 2: differences between versions (e.g. 21 vs 23)
Still unclear how to best handle: Try?:
|
Some additional information regarding REST because we are currently implementing this. 1.) SLURM Rest can be also started in so called Inet mode by regular users (https://slurm.schedmd.com/rest.html#inet). So maybe we can start a slurmrestd process (which is simply a HTTP wrapper/webserver) when we start scom. Additionally it can work with local authentication if it's in the same munge perimeter (see slide 9 here: https://slurm.schedmd.com/PEARC20/REST_API.pdf). SLURM Rest (AFAIK) is supported from 20.x on. The only caveat is that SLURM Rest has to be explicitly enabled during build/compile time. So some of the sites might not have it :-( 2.) One additional benefit of having SLURM Rest is that SchedMD recommends to set up a Caching proxy and I guess most sites will such a setup. We have setup an nginx caching proxy that does TLS + caching of the REST calls (5 minutes cache time and url + SLURM user http header is used as caching key). This will help not to overload the slurmctld/slurmdbd and might be good enough to avoid having to develop a scom caching service/process |
Each site has the choice as to which binaries they provide. It is up to each project to decide prerequisites and what levels of compatibility they will provide.
In Slurm-22.05, there was a one-to-one mapping for the commands that got implemented. They actually called the same plugin code. In Slurm-23.02, we did a major redesign and split out the code that did the conversion from Slurm's internal data structures to JSON/YAML. This allows us to add the conversion to JSON all over the place without needing to actually touch any of the openapi/slurmrestd code directly along with several performance improvements. The resultant JSON for each internal structure will be the exactly same but the wrapping around it has changed. Right now, the only changes visible would be in the
I think this is a design choice for your project of how much compatibility is important vs the complexity of the implementation.
It might be easier to use some kind of templated code to handle the conversion of these items but I'm not familiar enough with GO to give suggestions. The provided examples of state flags will not have as easy of conversion because there are actually a set of new flags that are available in 23.02.
I would suggest against scom starting a slurmrestd daemon as that could inadvertently cause a security issue without extra steps to ensure the sockets are only visible to scom only. Using inet mode naturally avoids any of those issues:
Caching proxies are really effective to reduce the load on controllers and adding TLS when clients may be outside of the tightly controlled cluster network. The newly added rate limiting the controllers can also help with this: https://slurm.schedmd.com/slurm.conf.html#OPT_rl_enable |
Using the latest RC build of Slurm 23.02
scom not working
Thanks!
ERROR: sacct JSON failed to parse, note your slurm version and open an issue with us here: https://github.com/CLIP-HPC/SlurmCommander/issues/new/choose
The text was updated successfully, but these errors were encountered: