-
Notifications
You must be signed in to change notification settings - Fork 83
(BOLT-459) Create reboot plan #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Still needs some testing. |
fe00117
to
ffd6370
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main concerns detailed in comments, but also if a task fails, but the plan handles that failure, will it still show as a failure in the PE GUI? If so this will generate a lot of failures and be very ugly...
a0dc39c
to
a26a2ce
Compare
} | ||
|
||
# Reboot; catch errors here because the connection may get cut out from underneath | ||
$reboot_result = run_task('reboot', $nodes, timeout => $reboot_delay, message => $message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this be a single plan means that if most nodes successfully reboot but one fails, it's hard to recover. May need to split waiting for the reboot into a separate plan. Should we catch errors, wait for reboot on the successful nodes, then fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reidmv any input on this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catch errors -> wait for all nodes to finish -> fail seems like a logical eventflow to me, but I don't have an exact use case in mind.
e35d86c
to
7021a80
Compare
92d3f69
to
95146e2
Compare
I think I've added the minimum required testing. Probably won't have time to expand on it for a few weeks. |
2 approves: is this ready for merge? |
Still working on getting acceptance tests to pass in CI. |
spec/acceptance/zplans/init_spec.rb
Outdated
# Ensure Bolt logger is initialized so 'notice' works when running the reboot plan. | ||
# Needs to be moved to BoltSpec. | ||
require 'bolt/logger' | ||
Bolt::Logger.initialize_logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff is really messy, and doesn't seem to work consistently. @adreyer any idea what's going on with logging in BoltSpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this somewhat. There's still a lot of awkward setup we should probably roll into a BoltSpec helper.
011c59c
to
07d7208
Compare
Looks like I got a clean CI run yesterday evening. |
Oh, finally figured out what's going on with tests. Some VMs don't like to be rebooted. The reboot acceptance tests themselves catch the reboot command and kill it, so no actual reboot happens. |
a4850d6
to
b923ca4
Compare
Add bash and powershell implementations of the reboot task so it can be run on systems without Ruby and Puppet installed.
Adds a plan that reboots targets, then waits until they're available again.
Adds rspec unit tests for reboot plan based on BoltSpec::Plans.
Tests are constrained by needing to kill the shutdown command so we don't actually restart VMs.
Update Bolt test dependency to 1.3 to eliminate cludges to work around issues in BoltSpec.
Use Bolt 1.3`s new `wait_until_available` function to reduce the number of task runs we need to do to determine whether targets have rebooted.
b923ca4
to
c9bf10d
Compare
Creates a reboot plan that queries last boot time, reboots targets, then waits until all have rebooted or we timeout. Reverts previous work to add a wait function.