Skip to content

[CI] IndexShardTests testReentrantEngineReadLockAcquisitionInRefreshListener failing #126628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elasticsearchmachine opened this issue Apr 10, 2025 · 2 comments · Fixed by #126685
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Distributed Indexing Meta label for Distributed Indexing team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

Build Scans:

Reproduction Line:

./gradlew ":server:test" --tests "org.elasticsearch.index.shard.IndexShardTests.testReentrantEngineReadLockAcquisitionInRefreshListener" -Dtests.seed=F4A2389381739651 -Dtests.locale=en-RW -Dtests.timezone=America/Indiana/Knox -Druntime.java=24

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=3062, name=Thread-520, state=RUNNABLE, group=TGRP-IndexShardTests]

Issue Reasons:

  • [main] 4 failures in test testReentrantEngineReadLockAcquisitionInRefreshListener (7.5% fail rate in 53 executions)
  • [main] 3 failures in step part-1 (7.3% fail rate in 41 executions)
  • [main] 3 failures in pipeline elasticsearch-pull-request (7.5% fail rate in 40 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI labels Apr 10, 2025
elasticsearchmachine added a commit that referenced this issue Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 4 failures in test testReentrantEngineReadLockAcquisitionInRefreshListener (7.5% fail rate in 53 executions)
  • [main] 3 failures in step part-1 (7.3% fail rate in 41 executions)
  • [main] 3 failures in pipeline elasticsearch-pull-request (7.5% fail rate in 40 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Distributed Indexing Meta label for Distributed Indexing team labels Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@arteam arteam self-assigned this Apr 11, 2025
tlrx added a commit to tlrx/elasticsearch that referenced this issue Apr 11, 2025
…eshListener

I suspect the test resets/closes the reference manager
between the refresh and the retrieval of the segment
generation after the refresh.

By executing segmentGenerationAfterRefresh while
holding the engine reset lock we make sure there
are no concurrent engine resets meanwhile.

In the future, we should also ensure that
IndexShard.refresh() uses withEngine.

Closes elastic#126628
@tlrx tlrx closed this as completed in f57be54 Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Distributed Indexing Meta label for Distributed Indexing team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants