Skip to content

Support synthetic source for date fields when ignore_malformed is used #109410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Jun 5, 2024

Contributes to #106483.

@lkts lkts added the :StorageEngine/Mapping The storage related side of mappings label Jun 5, 2024
@lkts lkts requested review from martijnvg and kkrik-es June 6, 2024 17:14
return validMetrics();
}

private Object malformedValue() {
Copy link
Contributor Author

@lkts lkts Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These scenarios are moved to exampleMalformedValues which is now correctly used with synthetic source.

@@ -915,6 +927,10 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
} catch (IllegalArgumentException | ElasticsearchParseException | DateTimeException | ArithmeticException e) {
if (ignoreMalformed) {
context.addIgnoredField(mappedFieldType.name());
if (isSourceSynthetic) {
// Save a copy of the field so synthetic source can load it
context.doc().add(IgnoreMalformedStoredValues.storedField(name(), context.parser()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change.

@@ -152,7 +152,13 @@ protected List<ExampleMalformedValue> exampleMalformedValues() {
return List.of(
exampleMalformedValue("2016-03-99").mapping(mappingWithFormat("strict_date_optional_time||epoch_millis"))
.errorMatches("failed to parse date field [2016-03-99] with format [strict_date_optional_time||epoch_millis]"),
exampleMalformedValue("-522000000").mapping(mappingWithFormat("date_optional_time")).errorMatches("long overflow")
exampleMalformedValue("-522000000").mapping(mappingWithFormat("date_optional_time")).errorMatches("long overflow"),
Copy link
Contributor Author

@lkts lkts Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note here is that, due to the way this mapper is implemented, trying to f.e. index an object actually hard fails and does not go via malformed code path. This feels like a bug since it works in other mappers. Fixing it is a breaking change though i imagine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm why is it a breaking change, we always return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ignore_malformed is true we currently return an error but we'll stop if we fix the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the expectation? I'm trying to understand if it's a breaking change or a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a breaking change. If we ignore_malformed is set to true and we fail because of malformed data in whatever form then that was a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, i'll create a bug for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1125,8 +1125,15 @@ public final void testSyntheticSource() throws IOException {

public void testSyntheticSourceIgnoreMalformedExamples() throws IOException {
assumeTrue("type doesn't support ignore_malformed", supportsIgnoreMalformed());
CheckedConsumer<XContentBuilder, IOException> mapping = syntheticSourceSupport(true).example(1).mapping();
// This is weird but we need to call it in order to hit the assumption inside
Copy link
Contributor Author

@lkts lkts Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this test used mapping from syntheticSourceSupport which is wrong since ExampleMalformedValue contains a mapping that corresponds to the malformed example. F.e. malformed example can be specific to a date format that is specified in the mapping. I can extract this change into a separate PR if desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Skip the "This is weird but" prefix.

Consider adding an example in the comment to clarify.

@lkts lkts marked this pull request as ready for review June 6, 2024 23:04
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

ESTestCase::randomLong,
ESTestCase::randomFloat,
ESTestCase::randomDouble,
ESTestCase::randomBoolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed where these are moved, same with no metrics below..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the top ones.. Let's use the random* instead of hardcoded "hello", 120 etc

@@ -165,6 +168,33 @@ protected List<ExampleMalformedValue> exampleMalformedValues() {
// invalid metric value
exampleMalformedValue(b -> b.startObject().field("min", "10.0").field("max", 50.0).field("value_count", 14).endObject())
.errorMatches("Failed to parse object: expecting token of type [VALUE_NUMBER] but found [VALUE_STRING]"),
// invalid metric value with additional data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that the field("hello", "world") triggers the error, and we expect everything before and after to show up in synthetic source.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lkts lkts merged commit a9f31bd into elastic:main Jun 10, 2024
15 checks passed
@lkts lkts deleted the feature/date_synthetic_source_with_ignore_malformed branch June 10, 2024 17:26
@felixbarny felixbarny mentioned this pull request Aug 6, 2024
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants