-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Support synthetic source for date fields when ignore_malformed is used #109410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support synthetic source for date fields when ignore_malformed is used #109410
Conversation
return validMetrics(); | ||
} | ||
|
||
private Object malformedValue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These scenarios are moved to exampleMalformedValues
which is now correctly used with synthetic source.
@@ -915,6 +927,10 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio | |||
} catch (IllegalArgumentException | ElasticsearchParseException | DateTimeException | ArithmeticException e) { | |||
if (ignoreMalformed) { | |||
context.addIgnoredField(mappedFieldType.name()); | |||
if (isSourceSynthetic) { | |||
// Save a copy of the field so synthetic source can load it | |||
context.doc().add(IgnoreMalformedStoredValues.storedField(name(), context.parser())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual change.
@@ -152,7 +152,13 @@ protected List<ExampleMalformedValue> exampleMalformedValues() { | |||
return List.of( | |||
exampleMalformedValue("2016-03-99").mapping(mappingWithFormat("strict_date_optional_time||epoch_millis")) | |||
.errorMatches("failed to parse date field [2016-03-99] with format [strict_date_optional_time||epoch_millis]"), | |||
exampleMalformedValue("-522000000").mapping(mappingWithFormat("date_optional_time")).errorMatches("long overflow") | |||
exampleMalformedValue("-522000000").mapping(mappingWithFormat("date_optional_time")).errorMatches("long overflow"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note here is that, due to the way this mapper is implemented, trying to f.e. index an object actually hard fails and does not go via malformed code path. This feels like a bug since it works in other mappers. Fixing it is a breaking change though i imagine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm why is it a breaking change, we always return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When ignore_malformed
is true we currently return an error but we'll stop if we fix the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the expectation? I'm trying to understand if it's a breaking change or a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a breaking change. If we ignore_malformed is set to true and we fail because of malformed data in whatever form then that was a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, i'll create a bug for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1125,8 +1125,15 @@ public final void testSyntheticSource() throws IOException { | |||
|
|||
public void testSyntheticSourceIgnoreMalformedExamples() throws IOException { | |||
assumeTrue("type doesn't support ignore_malformed", supportsIgnoreMalformed()); | |||
CheckedConsumer<XContentBuilder, IOException> mapping = syntheticSourceSupport(true).example(1).mapping(); | |||
// This is weird but we need to call it in order to hit the assumption inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this test used mapping from syntheticSourceSupport
which is wrong since ExampleMalformedValue
contains a mapping that corresponds to the malformed example. F.e. malformed example can be specific to a date format that is specified in the mapping. I can extract this change into a separate PR if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Skip the "This is weird but" prefix.
Consider adding an example in the comment to clarify.
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @lkts, I've created a changelog YAML for you. |
ESTestCase::randomLong, | ||
ESTestCase::randomFloat, | ||
ESTestCase::randomDouble, | ||
ESTestCase::randomBoolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed where these are moved, same with no metrics below..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh the top ones.. Let's use the random* instead of hardcoded "hello", 120 etc
@@ -165,6 +168,33 @@ protected List<ExampleMalformedValue> exampleMalformedValues() { | |||
// invalid metric value | |||
exampleMalformedValue(b -> b.startObject().field("min", "10.0").field("max", 50.0).field("value_count", 14).endObject()) | |||
.errorMatches("Failed to parse object: expecting token of type [VALUE_NUMBER] but found [VALUE_STRING]"), | |||
// invalid metric value with additional data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that the field("hello", "world")
triggers the error, and we expect everything before and after to show up in synthetic source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Contributes to #106483.