Skip to content

Improve exception handling for JsonXContentParser #123439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

smalyshev
Copy link
Contributor

Ensures that all exception handling in the parser follows single code path and that JSON exceptions are handled in one place.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this could possibly have an impact on json parsing speed due to introducing lambdas at every token event. Is the lambda needed?
cc @original-brownbear

@smalyshev
Copy link
Contributor Author

@rjernst Thanks, I did not consider the performance aspect, I'll look into that. The lambda is strictly speaking not necessary, that part was added to avoid copypasting try/catch all over, but if it kills the performance then I can go back to copypasting.

@original-brownbear
Copy link
Member

Yea we can't do this, this is going to annihilate performance I would expect. We have some jmh benchmarks around mapper parsing that will probably show this quite clearly.

@smalyshev
Copy link
Contributor Author

smalyshev commented Feb 26, 2025

I run FilterContentBenchmark and don't see any substantial difference there (in fact, majority of runs were faster but not sure it means anything - probably just jitter) - is it the right one to use? I put the results here: https://docs.google.com/spreadsheets/d/1JxHtSVzgfcOCZVAVHuIj1NmkvUvZKRcN1gVCChXz3xo/edit in case it's useful. I'd expect JIT to unroll the statics and lambdas but I'm not sure how smart it is.

@original-brownbear
Copy link
Member

@smalyshev Does that benchmark even exercise this code path much? Almost looks like it just delegates to com.fasterxml.jackson.core.JsonGenerator#copyCurrentStructure? Also I'd expect the cost in that one to mainly live with the builder side of things, not the parsing?

I'd expect org.elasticsearch.benchmark.index.mapper.BeatsMapperBenchmark to show a regression (but run it on a clean machine, not a laptop ... at least for me results on a MacBook or similar are just all over in the JMH runs).

@smalyshev
Copy link
Contributor Author

@original-brownbear OK I'll try BeatsMapperBenchmark on a VM and see what happens.

@smalyshev
Copy link
Contributor Author

@original-brownbear So this is what I get on the Google VM:
from main:

BeatsMapperBenchmark.benchmarkParseKeywordFields  1600172297  avgt    5  19035.809 ± 63.917  ns/op
BeatsMapperBenchmark.benchmarkParseKeywordFields  1600172297  avgt    5  19312.652 ± 82.225  ns/op
BeatsMapperBenchmark.benchmarkParseKeywordFields  1600172297  avgt    5  19097.191 ± 168.124  ns/op

with the patch

BeatsMapperBenchmark.benchmarkParseKeywordFields  1600172297  avgt    5  18866.318 ± 190.904  ns/op
BeatsMapperBenchmark.benchmarkParseKeywordFields  1600172297  avgt    5  18837.553 ± 221.828  ns/op
BeatsMapperBenchmark.benchmarkParseKeywordFields  1600172297  avgt    5  19106.008 ± 160.673  ns/op

I'm not sure why the patch gets a tiny bit faster results, but it seems to be all within the error bars. Any other suggestions about how I could test it?

@smalyshev smalyshev added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Feb 26, 2025
@smalyshev smalyshev marked this pull request as ready for review February 27, 2025 18:09
@smalyshev smalyshev requested a review from a team as a code owner February 27, 2025 18:09
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Feb 27, 2025
@original-brownbear
Copy link
Member

@smalyshev even if you cannot show a regression in a microbenchmark. I would strongly discourage this kind of change. It might be that the compiler is able to inline the whole thing. But also it could be that for some percentage of runs It won't be able to do so (JIT is not deterministic at all and you might even be able to reproduce this fact by rerunning the benchmark a couple times in fresh JVMs).
If we want to make a change to this code, its keep it cleanly procedural IMO.

I also gave that benchmark a spin and it's not a good reproducer sadly because it mostly goes through the dot expanding parser which has so much overhead that it drowns out the overhead here. Sorry for the inaccurate suggestion, I couldn't find a good Rally run at all when I just checked.
You might be able to reproduce a change for this kind of thing in some Rally runs that are heavy in runtime fields, those parse heavily but whether that avoids the dot expander noise is a different topic.

@smalyshev
Copy link
Contributor Author

Maybe we could add parser-specific microbenchmark if the existing ones aren't good enough? It's not hard for me to unroll the lambdas, but if we don't have the good benchmark I can't be sure any change wouldn't be problematic so there's something that needs to be improved regardless.

@smalyshev
Copy link
Contributor Author

@original-brownbear I've inlined the lambdas and added JSON parser microbenchmark - please see if it looks good.

@smalyshev smalyshev requested a review from a team as a code owner March 3, 2025 17:51
@smalyshev smalyshev removed the request for review from a team March 3, 2025 17:53
@benwtrent benwtrent removed their request for review March 27, 2025 14:04
@original-brownbear
Copy link
Member

Thanks @smalyshev this looks quite nice. I could see this being a performance improvement as well actually from now reducing the code size in the hot path! Maybe wait for @rjernst to be good with this one as well but LGTM from my end!

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, but I have one more question

try {
return createParser(config, jsonFactory.createParser(content));
} catch (CharConversionException e) {
throw new XContentParseException(null, e.getMessage(), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonFactory.createParser can throw JsonParseException, so shouldn't this catch also use handleParseException?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Core/Infra/Core Core issues without another label >refactoring Team:Core/Infra Meta label for core/infra team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants