Skip to content

Refactor gcp-fixture multipart parser #125828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 15, 2025

Conversation

mhl-b
Copy link
Contributor

@mhl-b mhl-b commented Mar 28, 2025

fixture.gcs.GoogleCloudStorageHttpHandler#parseMultipartRequestBody is giving hard times to understand how parsing works. This PR tries to follow https://www.rfc-editor.org/rfc/rfc2046.html#page-19 as a guidance for multipart content parsing.

@mhl-b mhl-b added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0 labels Mar 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer 👍

Might be worth adding some tests given that it's non-trivial?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you add some unit tests for this? I.e. something test/fixtures/gcs-fixture/src/test/....

);
writeBlobVersionAsJson(exchange, newBlobVersion);
} else {
throw new AssertionError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the behaviour where an invalid body causes a test failure rather than just an IllegalStateException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw AssertionError errors from the MultipartUpload.parseBody now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sort of hoping to see tests that ensured that this kind of malformed body would throw an IAE from the parser. Throwing AssertionError from the parser itself basically means you can only test the happy path (or else catch AssertionError in tests which is best avoided if possible).

mhl-b and others added 2 commits March 28, 2025 00:57
@mhl-b
Copy link
Contributor Author

mhl-b commented Mar 28, 2025

It's covered by all tests that use fixture. Metrics, retries, rest tests. There are some failures that I will address.

@DaveCTurner
Copy link
Contributor

It's covered by all tests that uses fixture. Metrics, retries, rest tests. There are some failures that I will address.

I don't think that's true, these tests only cover the specific outputs that the GCS client sends. I suspect very few of them send multiple parts, and none of them send anything invalid.

@mhl-b
Copy link
Contributor Author

mhl-b commented Mar 30, 2025

There are tests that validate multipart upload parsing, directly and indirectly:

fixture.gcs.GoogleCloudStorageHttpHandlerTests#testIfGenerationMatch_MultipartUpload
org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainerRetriesTests#testWriteBlobWithRetries
org.elasticsearch.repositories.blobstore.testkit.analyze.RepositoryAnalysisSuccessIT#testRepositoryAnalysis

Also added new tests for the MultipartUpload#readUntilDelimiter and MultipartUpload#skipUntilDelimiter.

assertThat(content.get().v1(), equalTo(blobContainer.path().buildAsString() + "write_blob_max_retries"));
if (Objects.deepEquals(bytes, BytesReference.toBytes(content.get().v2()))) {
MultipartUpload multipartUpload = MultipartUpload.parseBody(exchange, exchange.getRequestBody());
assertTrue(multipartUpload.content().length() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's legitimate to upload an empty blob comprising a single empty part isn't it? Certainly is in S3. We should be covering this case in these tests.

Copy link
Contributor Author

@mhl-b mhl-b Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test explicitly set content length to be at least 1 byte long. Empty content is not expected.

    protected static byte[] randomBlobContent() {
        return randomBlobContent(1);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's legitimate to send 0 byte blob. It will produce two body parts, json metadata and empty part with headers.

Copy link
Contributor

@nicktindall nicktindall Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do away with that assertion (it might trip in the event someone changes randomBlobContent() down the track). It's not important for the test. I think the previous assertion (assertThat(content, isPresent());) was only there because the old parseMultipartRequestBody returned Optional.empty() when the parse failed.

I don't have strong feelings about it but given we check that the content equals the uploaded bytes further down it seems redundant?

Copy link
Contributor Author

@mhl-b mhl-b Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed test with min-size 0 and removed content length assertion b4869f5. Manually tested with 0 sized blob, it works.

);
writeBlobVersionAsJson(exchange, newBlobVersion);
} else {
throw new AssertionError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sort of hoping to see tests that ensured that this kind of malformed body would throw an IAE from the parser. Throwing AssertionError from the parser itself basically means you can only test the happy path (or else catch AssertionError in tests which is best avoided if possible).

@@ -758,6 +762,8 @@ private static Headers rangeHeader(long start, long end) {
return Headers.of("Range", Strings.format("bytes=%d-%d", start, end));
}

private static final String MULTIPART_BOUNDARY = "__END_OF_PART__a607a67c-6df7-4b87-b8a1-81f639a75a97__";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we randomize this? I know it was fixed before, within the scope of createGzipCompressedMultipartUploadBody, but even that was a bit odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throwing AssertionError from the parser itself basically means you can only test the happy path (or else catch AssertionError in tests which is best avoided if possible).

True, at the same time the input we have depends on GCP SDK, not something we can easily change to produce unhappy path. Even if we can hack GCP client, then we would test hacked client against our own fixture, further away from the real world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're generating this boundary, we can set it to whatever we want. Let's use randomness to reflect that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry you're replying to a different thread 🤦

Yeah I don't want to hack the GCP client to produce invalid things, I just want some unit tests to check we're not being unduly lenient with our parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, for the boundary, yes. For some reason I cannot comment directly on the handling assertion comment. Dropped it here. Will do the boundary part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixed boundary string is fine in this test. With random boundary and random content we need to sanitize content, it should not contain boundary within. New parser is agnostic to boundary string, a new test added for a generic multipart request. So I propose to leave this test as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, sorry, using the literal UUID a607a67c-6df7-4b87-b8a1-81f639a75a97 in this test implies to the reader that this UUID is in some way special. But it isn't, we could use any value here. We can keep the __END_OF_PART__ prefix to distinguish it from any string that could appear elsewhere in the body, but I'd really much prefer to have a random value here to help the reader understand that the boundary UUID is not required to be a fixed value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all. Added random boundary string. 905620d

I removed static MULTIPART_BOUNDARY in following commit.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhl-b
Copy link
Contributor Author

mhl-b commented Apr 11, 2025

@DaveCTurner are you ok with current version? There is still block on PR.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks Mikhail

@mhl-b mhl-b merged commit 5a7a425 into elastic:main Apr 15, 2025
17 checks passed
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants