[CI] HdfsTests class failing #127290

elasticsearchmachine · 2025-04-23T21:17:28Z

Build Scans:

Reproduction Line:

undefined

Applicable branches:
8.19

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

undefined

Issue Reasons:

[8.19] 5 consecutive failures in class org.elasticsearch.repositories.hdfs.HdfsTests
[8.19] 2 consecutive failures in step openjdk22_checkpart1_java-matrix
[8.19] 17 consecutive failures in step openjdk17_checkpart1_java-matrix
[8.19] 18 consecutive failures in step openjdk17_checkpart1_java-fips-matrix
[8.19] 19 consecutive failures in step openjdk21_checkpart1_java-matrix
[8.19] 18 consecutive failures in step graalvm-ce17_checkpart1_java-matrix
[8.19] 87 failures in class org.elasticsearch.repositories.hdfs.HdfsTests (14.5% fail rate in 602 executions)
[8.19] 15 failures in step openjdk22_checkpart1_java-matrix (88.2% fail rate in 17 executions)
[8.19] 17 failures in step openjdk17_checkpart1_java-matrix (100.0% fail rate in 17 executions)
[8.19] 18 failures in step openjdk17_checkpart1_java-fips-matrix (100.0% fail rate in 18 executions)
[8.19] 19 failures in step openjdk21_checkpart1_java-matrix (100.0% fail rate in 19 executions)
[8.19] 18 failures in step graalvm-ce17_checkpart1_java-matrix (100.0% fail rate in 18 executions)
[8.19] 19 failures in pipeline elasticsearch-periodic (100.0% fail rate in 19 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2025-04-23T21:17:59Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticsearchmachine · 2025-04-23T21:33:08Z

Pinging @elastic/es-data-management (Team:Data Management)

nielsbauman · 2025-04-24T07:58:29Z

This test class is part of snapshot/restore; rerouting to Distributed. #127287, #127288, and #127289 have already been correctly assigned to Distributed.

elasticsearchmachine · 2025-04-24T07:59:03Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

nielsbauman · 2025-04-24T13:01:12Z

Forgot to comment the stack trace for traceability:

Apr 24, 2025 5:29:09 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks	
WARNING: Will linger awaiting termination of 4 leaked thread(s).	
Apr 24, 2025 5:29:14 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks	
SEVERE: 4 threads leaked from SUITE scope at org.elasticsearch.repositories.hdfs.HdfsTests: 	
   1) Thread[id=79, name=ForkJoinPool.commonPool-worker-1, state=WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
   2) Thread[id=82, name=ForkJoinPool.commonPool-worker-4, state=WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
   3) Thread[id=81, name=ForkJoinPool.commonPool-worker-3, state=TIMED_WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:449)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1891)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
   4) Thread[id=80, name=ForkJoinPool.commonPool-worker-2, state=WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
Apr 24, 2025 5:29:14 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll	
INFO: Starting to interrupt leaked threads:	
   1) Thread[id=79, name=ForkJoinPool.commonPool-worker-1, state=WAITING, group=TGRP-HdfsTests]	
   2) Thread[id=82, name=ForkJoinPool.commonPool-worker-4, state=WAITING, group=TGRP-HdfsTests]	
   3) Thread[id=81, name=ForkJoinPool.commonPool-worker-3, state=TIMED_WAITING, group=TGRP-HdfsTests]	
   4) Thread[id=80, name=ForkJoinPool.commonPool-worker-2, state=WAITING, group=TGRP-HdfsTests]	
Apr 24, 2025 5:29:17 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll	
SEVERE: There are still zombie threads that couldn't be terminated:	
   1) Thread[id=79, name=ForkJoinPool.commonPool-worker-1, state=WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
   2) Thread[id=82, name=ForkJoinPool.commonPool-worker-4, state=WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
   3) Thread[id=81, name=ForkJoinPool.commonPool-worker-3, state=TIMED_WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:449)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1891)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)	
   4) Thread[id=80, name=ForkJoinPool.commonPool-worker-2, state=WAITING, group=TGRP-HdfsTests]	
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)	
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)	
        at java.base/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)	
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)	
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)

Changes "ForkJoinPool-" to "ForkJoinPool." in the Thread getName().startsWith() checks in HdfsClientThreadLeakFilter. This resolves the "There are still zombie threads that couldn't be terminated" errors in the Hdfs IT tests. Closes elastic#127290 Closes elastic#127289 Closes elastic#127288 Closes elastic#127287

Adds the ForkJoinPool.commonPool-worker- prefix to the Thread getName().startsWith() checks in HdfsClientThreadLeakFilter. This resolves the "There are still zombie threads that couldn't be terminated" errors in the Hdfs IT tests. Closes #127290 Closes #127289 Closes #127288 Closes #127287

…#127534) Adds the ForkJoinPool.commonPool-worker- prefix to the Thread getName().startsWith() checks in HdfsClientThreadLeakFilter. This resolves the "There are still zombie threads that couldn't be terminated" errors in the Hdfs IT tests. Closes elastic#127290 Closes elastic#127289 Closes elastic#127288 Closes elastic#127287 (cherry picked from commit 4408e38)

elasticsearchmachine added :Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI labels Apr 23, 2025

elasticsearchmachine added Team:Core/Infra Meta label for core/infra team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 23, 2025

rjernst added :Data Management/HDFS HDFS repository issues and removed :Core/Infra/Core Core issues without another label labels Apr 23, 2025

elasticsearchmachine added Team:Data Management Meta label for data/management team and removed Team:Core/Infra Meta label for core/infra team labels Apr 23, 2025

This was referenced Apr 24, 2025

[CI] HdfsBlobStoreRepositoryTests class failing #127289

Closed

[CI] HdfsRepositoryTests class failing #127288

Closed

[CI] HdfsBlobStoreContainerTests class failing #127287

Closed

JeremyDahlgren self-assigned this Apr 29, 2025

JeremyDahlgren mentioned this issue Apr 29, 2025

Adjust ForkJoinPool prefix in HdfsClientThreadLeakFilter #127534

Merged

JeremyDahlgren added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 29, 2025

JeremyDahlgren closed this as completed in #127534 May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] HdfsTests class failing #127290

[CI] HdfsTests class failing #127290

elasticsearchmachine commented Apr 23, 2025 •

edited

Loading

elasticsearchmachine commented Apr 23, 2025

elasticsearchmachine commented Apr 23, 2025

nielsbauman commented Apr 24, 2025

elasticsearchmachine commented Apr 24, 2025

nielsbauman commented Apr 24, 2025

[CI] HdfsTests class failing #127290

[CI] HdfsTests class failing #127290

Comments

elasticsearchmachine commented Apr 23, 2025 • edited Loading

elasticsearchmachine commented Apr 23, 2025

elasticsearchmachine commented Apr 23, 2025

nielsbauman commented Apr 24, 2025

elasticsearchmachine commented Apr 24, 2025

nielsbauman commented Apr 24, 2025

elasticsearchmachine commented Apr 23, 2025 •

edited

Loading