Skip to content

RUST-1894 Retry KMS requests on transient errors #1281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 14, 2025

Conversation

isabelatkinson
Copy link
Contributor

No description provided.

@@ -1,6 +1,6 @@
[profile.default]
test-threads = 1
default-filter = 'not test(test::happy_eyeballs)'
default-filter = 'not test(test::happy_eyeballs) and not test(kms_retry)'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POC for skipping tests locally that need additional setup (in this case, a mock server running). We don't need to include this in the CI filter because the tests in csfle are only run by the CSFLE tasks, in which case the mock server needed here has been started by start-servers.sh.

Cargo.toml Outdated
@@ -91,7 +91,7 @@ hmac = "0.12.1"
once_cell = "1.19.0"
log = { version = "0.4.17", optional = true }
md-5 = "0.10.1"
mongocrypt = { git = "https://github.com/mongodb/libmongocrypt-rust.git", branch = "main", optional = true, version = "0.2.0" }
mongocrypt = { git = "https://github.com/isabelatkinson/libmongocrypt-rust", branch = "retry-kms", optional = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert once mongodb/libmongocrypt-rust#37 is merged, this is causing the cargo-deny failure

@@ -3493,6 +3493,167 @@ async fn range_explicit_encryption_defaults() -> Result<()> {
Ok(())
}

// Prose Test 24. KMS Retry Tests
#[tokio::test]
// using openssl causes errors after configuring a network failpoint
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting opaque SSL errors from the Python server after configuring network errors with openssl-tls enabled - we're not gaining any useful coverage by running this test on rustls and openssl, so it didn't seem worth it to try to make it work

let certificate_file = std::fs::read(&certificate_file_path).unwrap();

let set_failpoint = |kind: &str, count: u8| {
// create a fresh client for each request to avoid hangs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more odd behavior: when I reused a single HTTP client for setting failpoints, this line would hang for 90 seconds and then connect successfully. I saw a few issues filed in reqwest's repo describing similar behavior that fixed it by re-creating the client for each request, so I just did the same here.

@isabelatkinson isabelatkinson marked this pull request as ready for review January 10, 2025 22:10
@isabelatkinson isabelatkinson merged commit 2dadbed into mongodb:main Jan 14, 2025
8 of 10 checks passed
@isabelatkinson isabelatkinson deleted the retry-kms-requests branch January 30, 2025 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants