-
Notifications
You must be signed in to change notification settings - Fork 395
Improve kt_renewer logging for Kerberos Ticket Renewal Failures #4145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
LOG.error("Couldn't reinit from keytab! `kinit' exited with %s.\n%s\n%s" % ( | ||
subp.returncode, "\n".join(subp_stdout), "\n".join(subp_stderr))) | ||
if retries >= max_retries: | ||
LOG.error("FATAL: max_retries of %s reached. Exiting..." % max_retries) | ||
sys.exit(subp.returncode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will looping forever after removal. Without eventually let the KT RENEWER process exit, I think it won't bob up the issue in CM.
@amitsrivastava please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I too thought the same, if there is an actual failure and not an intermittent issue, this will loop forever. I'll use the previous "sys.exit(subp.returncode)" itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand where the infinite loop is. Won't it just run for max_retries
times and then got for sys.exit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request @quadoss to review since he fixed KT issue before.
4248099
to
d50eb85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request improves the logging mechanism for Kerberos ticket renewal failures by ensuring the subprocess output streams are only read once and properly decoded, and by increasing the retry interval to improve recovery chances during short KDC outages.
- Updated subprocess output reading and decoding to prevent empty error logs.
- Increased the delay in retry attempts from 3 to 10 seconds.
|
||
LOG.error("Couldn't reinit from keytab! `kinit' exited with %s.\n%s\n%s" % ( | ||
subp.returncode, "\n".join(subp_stdout), "\n".join(subp_stderr))) | ||
if retries >= max_retries: | ||
LOG.error("FATAL: max_retries of %s reached. Exiting..." % max_retries) | ||
sys.exit(subp.returncode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand where the infinite loop is. Won't it just run for max_retries
times and then got for sys.exit
?
What changes were proposed in this pull request?
The previous implementation attempted to read the subprocess output streams (stdout and stderr) twice, resulting in empty error messages being logged when kinit failed. The updated code reads the output streams once and decodes them.
Also increased the retry interval from 3 to 10 seconds to provide a better chance of recovery during short KDC outages
How was this patch tested?
Manual tests, tested on PvC cluster.
Please review Hue Contributing Guide before opening a pull request.