-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Refactor gcp-fixture multipart parser #125828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
test/fixtures/gcs-fixture/src/main/java/fixture/gcs/MultipartUpload.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer 👍
Might be worth adding some tests given that it's non-trivial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you add some unit tests for this? I.e. something test/fixtures/gcs-fixture/src/test/...
.
test/fixtures/gcs-fixture/src/main/java/fixture/gcs/MultipartUpload.java
Outdated
Show resolved
Hide resolved
); | ||
writeBlobVersionAsJson(exchange, newBlobVersion); | ||
} else { | ||
throw new AssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the behaviour where an invalid body causes a test failure rather than just an IllegalStateException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw AssertionError errors from the MultipartUpload.parseBody now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sort of hoping to see tests that ensured that this kind of malformed body would throw an IAE from the parser. Throwing AssertionError
from the parser itself basically means you can only test the happy path (or else catch AssertionError
in tests which is best avoided if possible).
Co-authored-by: David Turner <[email protected]>
It's covered by all tests that use fixture. Metrics, retries, rest tests. There are some failures that I will address. |
I don't think that's true, these tests only cover the specific outputs that the GCS client sends. I suspect very few of them send multiple parts, and none of them send anything invalid. |
There are tests that validate multipart upload parsing, directly and indirectly:
Also added new tests for the |
assertThat(content.get().v1(), equalTo(blobContainer.path().buildAsString() + "write_blob_max_retries")); | ||
if (Objects.deepEquals(bytes, BytesReference.toBytes(content.get().v2()))) { | ||
MultipartUpload multipartUpload = MultipartUpload.parseBody(exchange, exchange.getRequestBody()); | ||
assertTrue(multipartUpload.content().length() > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's legitimate to upload an empty blob comprising a single empty part isn't it? Certainly is in S3. We should be covering this case in these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test explicitly set content length to be at least 1 byte long. Empty content is not expected.
protected static byte[] randomBlobContent() {
return randomBlobContent(1);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's legitimate to send 0 byte blob. It will produce two body parts, json metadata and empty part with headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could do away with that assertion (it might trip in the event someone changes randomBlobContent()
down the track). It's not important for the test. I think the previous assertion (assertThat(content, isPresent());
) was only there because the old parseMultipartRequestBody
returned Optional.empty()
when the parse failed.
I don't have strong feelings about it but given we check that the content equals the uploaded bytes further down it seems redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed test with min-size 0 and removed content length assertion b4869f5. Manually tested with 0 sized blob, it works.
); | ||
writeBlobVersionAsJson(exchange, newBlobVersion); | ||
} else { | ||
throw new AssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sort of hoping to see tests that ensured that this kind of malformed body would throw an IAE from the parser. Throwing AssertionError
from the parser itself basically means you can only test the happy path (or else catch AssertionError
in tests which is best avoided if possible).
@@ -758,6 +762,8 @@ private static Headers rangeHeader(long start, long end) { | |||
return Headers.of("Range", Strings.format("bytes=%d-%d", start, end)); | |||
} | |||
|
|||
private static final String MULTIPART_BOUNDARY = "__END_OF_PART__a607a67c-6df7-4b87-b8a1-81f639a75a97__"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we randomize this? I know it was fixed before, within the scope of createGzipCompressedMultipartUploadBody
, but even that was a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throwing AssertionError from the parser itself basically means you can only test the happy path (or else catch AssertionError in tests which is best avoided if possible).
True, at the same time the input we have depends on GCP SDK, not something we can easily change to produce unhappy path. Even if we can hack GCP client, then we would test hacked client against our own fixture, further away from the real world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're generating this boundary, we can set it to whatever we want. Let's use randomness to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry you're replying to a different thread 🤦
Yeah I don't want to hack the GCP client to produce invalid things, I just want some unit tests to check we're not being unduly lenient with our parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, for the boundary, yes. For some reason I cannot comment directly on the handling assertion comment. Dropped it here. Will do the boundary part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fixed boundary string is fine in this test. With random boundary and random content we need to sanitize content, it should not contain boundary within. New parser is agnostic to boundary string, a new test added for a generic multipart request. So I propose to leave this test as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, sorry, using the literal UUID a607a67c-6df7-4b87-b8a1-81f639a75a97
in this test implies to the reader that this UUID is in some way special. But it isn't, we could use any value here. We can keep the __END_OF_PART__
prefix to distinguish it from any string that could appear elsewhere in the body, but I'd really much prefer to have a random value here to help the reader understand that the boundary UUID is not required to be a fixed value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all. Added random boundary string. 905620d
I removed static MULTIPART_BOUNDARY
in following commit.
…sticsearch into gcp-fixture-multipart-refactor
…sticsearch into gcp-fixture-multipart-refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@DaveCTurner are you ok with current version? There is still block on PR. |
…sticsearch into gcp-fixture-multipart-refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks Mikhail
fixture.gcs.GoogleCloudStorageHttpHandler#parseMultipartRequestBody
is giving hard times to understand how parsing works. This PR tries to follow https://www.rfc-editor.org/rfc/rfc2046.html#page-19 as a guidance for multipart content parsing.