Skip to content

ESQL: Fix usage of already released null block in ValueSourceReaderOperator #126411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Apr 7, 2025

Fix #125850

@alex-spies alex-spies added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.18.1 v8.19.0 v9.0.1 v9.1.0 v8.17.5 labels Apr 7, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies
Copy link
Contributor Author

In addition to the enclosed IT, I'd love to have a unit test for this.

It looks like we should be able to add a case to ValueSourceReaderOperatorTests, but I struggled to build one that travels the specific code path with the bug, though. The reason is that the bug only happens when we re-use the same ComputeBlockLoaderFactory and create null blocks for multiple pages; but this only happens when the doc block doesn't belong to a single shard, but multiple ones.

@nik9000 if you have an idea how to add a test without major effort, I'd love to add it, too.

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've updated the changelog YAML for you.

// However, downstream operators take ownership of the null block we hand to them and may close it, forcing us to create a new
// one.
// This could be avoided altogether if this factory itself kept a reference to the null block, but we'd have to also ensure
// to release this block once the factory is not used anymore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the factory itself would have to be Releasable. FWIW, that's not really a big deal. I think I'd prefer it to this fix to be honest. I'm kind of ok with whichever though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you are trying to get this merged quickly I'd be fine with this solution - especially if you are willing to try the other in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer making this Releasable. It's easy to do, I'll push a commit with the change.

Wrong implementations of .close() or forgetting to incref when handing out blocks actually nicely show up as test failures already in the existing tests, so I think this is guarding better against regressions.

Comment on lines +348 to +368
int p = forwards[0];
int shard = shards.getInt(p);
int segment = segments.getInt(p);
int firstDoc = docs.getInt(p);
positionFieldWork(shard, segment, firstDoc);
LeafReaderContext ctx = ctx(shard, segment);
fieldsMoved(ctx, shard);
verifyBuilders(loaderBlockFactory, shard);
read(docs.getInt(p), shard);
read(firstDoc, shard);
for (int i = 1; i < forwards.length; i++) {
p = forwards[i];
shard = shards.getInt(p);
segment = segments.getInt(p);
boolean changedSegment = positionFieldWorkDocGuarteedAscending(shard, segment);
if (changedSegment) {
ctx = ctx(shard, segment);
fieldsMoved(ctx, shard);
}
verifyBuilders(loaderBlockFactory, shard);
read(docs.getInt(p), shard);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just got indented, the diff is actually just whitespace o.O

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add ?w=1 to the url it'll ignore whitespace ala git diff -w. There's an option for it hiding behind the gear icon at the top of the page too.

@alex-spies alex-spies requested a review from nik9000 April 8, 2025 09:38
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it much more now. Thanks!

@alex-spies alex-spies added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 8, 2025
@alex-spies
Copy link
Contributor Author

The Serverless checks are failing with an unrelated test failure that is already known (Serverless CI issue opened last week, re-muted 4h ago).

I think it's not worth updating the branch to run CI again just to pick up on the mute for this test, so I'm going to go ahead with the merge as all other tests and CI checks passed.

@alex-spies alex-spies merged commit ffd4913 into elastic:main Apr 8, 2025
16 of 17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18
8.x Commit could not be cherrypicked due to conflicts
9.0
8.17

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126411

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 8, 2025
…erator (elastic#126411)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 8, 2025
…erator (elastic#126411)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 8, 2025
…erator (elastic#126411)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable
@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 8, 2025
…erator (elastic#126411)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable

(cherry picked from commit ffd4913)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 8, 2025
…erator (#126411) (#126475)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable
elasticsearchmachine pushed a commit that referenced this pull request Apr 8, 2025
…erator (#126411) (#126474)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable
elasticsearchmachine pushed a commit that referenced this pull request Apr 8, 2025
…erator (#126411) (#126476)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable
elasticsearchmachine pushed a commit that referenced this pull request Apr 8, 2025
…erator (#126411) (#126480)

* Add yaml test with reproducer
* Fix the bug
* Make ComputeBlockLoaderFactory Releasable

(cherry picked from commit ffd4913)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.5 v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: can't increase refCount on already released object [ConstantNullBlock[...]]
4 participants