Skip to content

[CI] TimeSeriesLifecycleActionsIT testDeleteActionDoesntDeleteSearchableSnapshot failing #126053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elasticsearchmachine opened this issue Apr 1, 2025 · 2 comments · Fixed by #126605
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:ilm:qa:multi-node:javaRestTest" --tests "org.elasticsearch.xpack.ilm.TimeSeriesLifecycleActionsIT.testDeleteActionDoesntDeleteSearchableSnapshot" -Dtests.seed=8E4F510D5CCE007A -Dtests.locale=pa-Guru-IN -Dtests.timezone=Africa/Djibouti -Druntime.java=24

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: null

Issue Reasons:

  • [main] 2 failures in test testDeleteActionDoesntDeleteSearchableSnapshot (0.3% fail rate in 638 executions)
  • [main] 2 failures in pipeline elasticsearch-periodic-platform-support (15.4% fail rate in 13 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI Team:Data Management Meta label for data/management team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 1, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-data-management (Team:Data Management)

@nielsbauman nielsbauman added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 1, 2025
@nielsbauman nielsbauman self-assigned this Apr 1, 2025
@nielsbauman
Copy link
Contributor

Again caused by ILM being stuck in the wait-for-index-color step. Same as #125867 and a few other test failures in a different class. A fix is being worked on.

nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 1, 2025
ILM sometimes skips a policy/index for a cluster state update if the
step was still running/enqueued while the update came in. That on its
own isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 1, 2025
ILM sometimes skips a policy/index for a cluster state update if the
step is still running/enqueued when the update comes in. That on its own
isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
nielsbauman added a commit that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
Fixes elastic#126354
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
Fixes elastic#126354

(cherry picked from commit 3231eb2)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354

(cherry picked from commit 3231eb2)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354

(cherry picked from commit 3231eb2)

# Conflicts:
#	muted-tests.yml
#	x-pack/plugin/ilm/qa/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
2 participants