Skip to content

[CI] IndexRecoveryIT testSourceThrottling failing #123680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elasticsearchmachine opened this issue Feb 28, 2025 · 3 comments
Open

[CI] IndexRecoveryIT testSourceThrottling failing #123680

elasticsearchmachine opened this issue Feb 28, 2025 · 3 comments
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Indexing Meta label for Distributed Indexing team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Feb 28, 2025

Build Scans:

Reproduction Line:

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.indices.recovery.IndexRecoveryIT.testSourceThrottling" -Dtests.seed=ACE8061091E75DD1 -Dtests.locale=sr-Cyrl-RS -Dtests.timezone=Asia/Brunei -Druntime.java=17

Applicable branches:
8.18

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: Throttling should be =0 for 'node_t1'
Expected: <0L>
     but: was <67L>

Issue Reasons:

  • [8.18] 2 failures in test testSourceThrottling (0.4% fail rate in 470 executions)
  • [8.18] 2 failures in pipeline elasticsearch-periodic (14.3% fail rate in 14 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Distributed Indexing Meta label for Distributed Indexing team labels Feb 28, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@arteam arteam assigned arteam and unassigned arteam Feb 28, 2025
@arteam arteam added the low-risk An open issue or test failure that is a low risk to future releases label Feb 28, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:risk Requires assignment of a risk label (low, medium, blocker) label Feb 28, 2025
@arteam
Copy link
Contributor

arteam commented Feb 28, 2025

I added more logging for this use case in #122830, but it doesn't provided a lot of new information

java.lang.AssertionError: Throttling should be =0 for 'node_t1'. Node stats: {	
  "_nodes" : {	
    "total" : 1,	
    "successful" : 1,	
    "failed" : 0	
  },	
  "cluster_name" : "TEST-TEST_WORKER_VM=[451]-CLUSTER_SEED=[2564335054453859398]-HASH=[143C09A7FE0]-cluster",	
  "nodes" : {	
    "xb2GoITxQYenj7tE1Mf1hw" : {	
      "timestamp" : 1740724121481,	
      "name" : "node_t1",	
      "transport_address" : "127.0.0.1:26862",	
      "host" : "127.0.0.1",	
      "ip" : "127.0.0.1:26862",	
      "roles" : [	
        "data",	
        "data_cold",	
        "data_content",	
        "data_frozen",	
        "data_hot",	
        "data_warm",	
        "ingest",	
        "master",	
        "ml",	
        "remote_cluster_client",	
        "transform"	
      ],	
      "indices" : {	
        "recovery" : {	
          "current_as_source" : 0,	
          "current_as_target" : 0,	
          "throttle_time" : "48.7ms",	
          "throttle_time_in_millis" : 48	
        }	
      }	
    }	
  }	
}	

It seems that we can receive some throttling event on the target node after we unblock the recovery on the source node.

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 3 failures in test testSourceThrottling (0.4% fail rate in 839 executions)
  • [main] 2 failures in pipeline elasticsearch-periodic (11.8% fail rate in 17 executions)

Build Scans:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Indexing Meta label for Distributed Indexing team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

2 participants