Skip to content

[CI] FileSettingsRoleMappingsRestartIT testReservedStatePersistsOnRestart failing #120923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elasticsearchmachine opened this issue Jan 27, 2025 · 5 comments
Assignees
Labels
:Core/Infra/Settings Settings infrastructure and APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Jan 27, 2025

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:security:internalClusterTest" --tests "org.elasticsearch.xpack.security.FileSettingsRoleMappingsRestartIT.testReservedStatePersistsOnRestart" -Dtests.seed=84B6A2BE7D7481F2 -Dtests.locale=sr -Dtests.timezone=America/Los_Angeles -Druntime.java=23

Applicable branches:
8.x

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: 
Expected: iterable with items [] in any order
     but: no match for: "everyone_kibana_alone"

Issue Reasons:

  • [8.x] 2 consecutive failures in step almalinux-9_platform-support-unix
  • [8.x] 11 failures in test testReservedStatePersistsOnRestart (6.5% fail rate in 168 executions)
  • [8.x] 3 failures in step part-4 (7.5% fail rate in 40 executions)
  • [8.x] 2 failures in step almalinux-9_platform-support-unix (66.7% fail rate in 3 executions)
  • [8.x] 2 failures in step openjdk23_checkpart4_java-matrix (66.7% fail rate in 3 executions)
  • [8.x] 2 failures in pipeline elasticsearch-periodic (66.7% fail rate in 3 executions)
  • [8.x] 2 failures in pipeline elasticsearch-periodic-platform-support (66.7% fail rate in 3 executions)
  • [8.x] 3 failures in pipeline elasticsearch-pull-request (7.9% fail rate in 38 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Core/Infra/Settings Settings infrastructure and APIs >test-failure Triaged test failures from CI labels Jan 27, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 2 consecutive failures in test testReservedStatePersistsOnRestart
  • [main] 6 failures in test testReservedStatePersistsOnRestart (0.7% fail rate in 803 executions)
  • [main] 3 failures in step part-4 (1.0% fail rate in 313 executions)
  • [main] 2 failures in step part-4-fips (20.0% fail rate in 10 executions)
  • [main] 5 failures in pipeline elasticsearch-pull-request (1.6% fail rate in 312 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Jan 27, 2025
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jan 27, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-core-infra (Team:Core/Infra)

@ldematte
Copy link
Contributor

Failures have spiked in the last 2 days, but this test has been randomly failing since the 17th of December. There hasn't been a change to ReservedClusterStateService or FileSettingsService since #118351, so I'm putting this down to medium-risk, as it might be a bug but it seems to happen rarely.
(it might also be a test bug; if it turns out to be that way, we can lower this to low-risk)
@prdoyle do you think #118351 may be related?

@ldematte ldematte added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jan 28, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.x

Mute Reasons:

  • [8.x] 2 consecutive failures in step almalinux-9_platform-support-unix
  • [8.x] 11 failures in test testReservedStatePersistsOnRestart (6.5% fail rate in 168 executions)
  • [8.x] 3 failures in step part-4 (7.5% fail rate in 40 executions)
  • [8.x] 2 failures in step almalinux-9_platform-support-unix (66.7% fail rate in 3 executions)
  • [8.x] 2 failures in step openjdk23_checkpart4_java-matrix (66.7% fail rate in 3 executions)
  • [8.x] 2 failures in pipeline elasticsearch-periodic (66.7% fail rate in 3 executions)
  • [8.x] 2 failures in pipeline elasticsearch-periodic-platform-support (66.7% fail rate in 3 executions)
  • [8.x] 3 failures in pipeline elasticsearch-pull-request (7.9% fail rate in 38 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Jan 28, 2025
@slobodanadamovic slobodanadamovic self-assigned this Jan 30, 2025
@prdoyle
Copy link
Contributor

prdoyle commented Apr 11, 2025

The failures are occurring in a call to assertRoleMappingsInClusterState, which these tests sometimes call instead of assertRoleMappingsInClusterStateWithAwait. I wonder why it's ok not to wait? Possibly because it's looking at the initial state, and it's doing so immediately after awaitFileSettingsWatcher.

Just wondering because I've just fixed a similar bug in #126720.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Settings Settings infrastructure and APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants