Skip to content

Improve kt_renewer logging for Kerberos Ticket Renewal Failures #4145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

athithyaaselvam
Copy link
Collaborator

@athithyaaselvam athithyaaselvam commented May 21, 2025

What changes were proposed in this pull request?

The previous implementation attempted to read the subprocess output streams (stdout and stderr) twice, resulting in empty error messages being logged when kinit failed. The updated code reads the output streams once and decodes them.
Also increased the retry interval from 3 to 10 seconds to provide a better chance of recovery during short KDC outages

How was this patch tested?

Manual tests, tested on PvC cluster.

Please review Hue Contributing Guide before opening a pull request.

Copy link

github-actions bot commented May 21, 2025

⚠️ No test files modified. Please ensure that changes are properly tested. ⚠️

@athithyaaselvam athithyaaselvam requested a review from agl29 May 21, 2025 19:58

LOG.error("Couldn't reinit from keytab! `kinit' exited with %s.\n%s\n%s" % (
subp.returncode, "\n".join(subp_stdout), "\n".join(subp_stderr)))
if retries >= max_retries:
LOG.error("FATAL: max_retries of %s reached. Exiting..." % max_retries)
sys.exit(subp.returncode)
Copy link
Collaborator

@wing2fly wing2fly May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will looping forever after removal. Without eventually let the KT RENEWER process exit, I think it won't bob up the issue in CM.
@amitsrivastava please correct me if I am wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I too thought the same, if there is an actual failure and not an intermittent issue, this will loop forever. I'll use the previous "sys.exit(subp.returncode)" itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand where the infinite loop is. Won't it just run for max_retries times and then got for sys.exit?

@wing2fly wing2fly requested a review from quadoss May 21, 2025 20:04
Copy link
Collaborator

@wing2fly wing2fly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request @quadoss to review since he fixed KT issue before.

Copy link

github-actions bot commented May 21, 2025

Python Code Coverage

Python Coverage Report •
FileStmtsMissCoverMissing
TOTAL537342701349% 
report-only-changed-files is enabled. No files were changed during this commit :)

Pytest Report

Tests Skipped Failures Errors Time
1090 106 💤 0 ❌ 0 🔥 6m 2s ⏱️

@Harshg999 Harshg999 requested a review from Copilot May 22, 2025 09:19
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request improves the logging mechanism for Kerberos ticket renewal failures by ensuring the subprocess output streams are only read once and properly decoded, and by increasing the retry interval to improve recovery chances during short KDC outages.

  • Updated subprocess output reading and decoding to prevent empty error logs.
  • Increased the delay in retry attempts from 3 to 10 seconds.


LOG.error("Couldn't reinit from keytab! `kinit' exited with %s.\n%s\n%s" % (
subp.returncode, "\n".join(subp_stdout), "\n".join(subp_stderr)))
if retries >= max_retries:
LOG.error("FATAL: max_retries of %s reached. Exiting..." % max_retries)
sys.exit(subp.returncode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand where the infinite loop is. Won't it just run for max_retries times and then got for sys.exit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants