Skip to content

Release 1.6.0 #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Oct 26, 2018
Merged

Release 1.6.0 #70

merged 36 commits into from
Oct 26, 2018

Conversation

sean-smith
Copy link
Contributor

Signed-off-by: Sean Smith [email protected]

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

fnubalaj and others added 30 commits August 14, 2018 12:24
Get the idle time threshold parameter from nodewatcher.cfg
Every minute check the node for running jobs and if idle for more than the threshold, then terminate
Store the idle time as a json file in case process terminates

Signed-off-by: Balaji Sridharan <[email protected]>
Deleted GetHourPercentile function
Added instance_id to logging
Changed the filename location and name

Signed-off-by: Balaji Sridharan <[email protected]>
Signed-off-by: Balaji Sridharan <[email protected]>
Persist idle_time every minute instead of "at exit" since atexit does not accound for SIGKILL
Use full path for qstat commands on sge and torque
Cast the config parameter for scaledown_idletime from string to int

Signed-off-by: Balaji Sridharan <[email protected]>
Changes the scaling logic from a cloudwatch metric produced
every two minutes to a cronjob on the node that directly sets the
asg values.

Signed-off-by: Sean Smith <[email protected]>
This primarily prevents against a condition where it is deemed fit to terminate a instance, but asg prevents from terminating that instance.

Signed-off-by: Balaji Sridharan <[email protected]>
Signed-off-by: Sean Smith <[email protected]>
Signed-off-by: Sean Smith <[email protected]>
Signed-off-by: Balaji Sridharan <[email protected]>
Parse the cfnconfig file in /opt/cfncluster to get the cfn_scheduler_slots config parameter
Based on that parameter set the slot count using ceil
Exit gracefully with error messages in cases of invalid configuration
Remove the print line from sge plugin as part of code cleanup

Signed-off-by: Balaji Sridharan <[email protected]>
The root cause of the issue is that a new release of pycparser
https://pypi.org/project/pycparser/#history
which doesn't work with python 2.6 was released.
Our dependencies chain is

pycparser -> cffi -> pynacl -> paramiko -> cfncluster-node

Signed-off-by: Luca Carrogu <[email protected]>
enrico-usai and others added 6 commits September 20, 2018 15:10
+ We are missing the proxy configuration in one boto client initialization.
+ I'm adding some log messages

Signed-off-by: Enrico Usai <[email protected]>
Scale up in 1.6.0rc1 through jobwatcher only took into account the number of nodes requested. This caused overscaling.
This fix takes into account both nodes and vcpus and packs the vcpus in a way that all the constraints during job submission are satisfied.
Also fixes a bug in get_busy_nodes where the count of busy nodes was obtained based on jobs instead of nodes themselves.
Added unit tests for get_optimal_nodes
Update version to 1.6.0rc3

Signed-off-by: Balaji Sridharan <[email protected]>
Signed-off-by: Balaji Sridharan <[email protected]>
Signed-off-by: Luca Carrogu <[email protected]>
@sean-smith sean-smith merged commit 7c4cee5 into master Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants