Skip to content

DisMax query takes CPU hostage until the heat death of the universe #130239

Closed
@benwtrent

Description

@benwtrent

Elasticsearch Version

8.15

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

When executing a significantly complicated dismax query, its possible that when iterating impacts, the iteration seemingly gets "stuck".

A CPU thread gets take hostage at 100%, and iterates forever. Task cancellation does nothing as the CPU is stuck in a busy loop working without doing any IO.

(stuck means running for hours and hours, requiring a server restart to stop)

It in particular, gets stuck in this loop:

https://github.com/apache/lucene/blob/42d5806fd69400bb42b7d15f6311ac02d3104efe/lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java#L90..L108

            private int advanceImpacts(int target) throws IOException {
              if (target > upTo) {
                moveToNextBlock(target);
              }


              while (true) {
                if (maxScore >= minScore) {
                  return target;
                }


                if (upTo == NO_MORE_DOCS) {
                  return NO_MORE_DOCS;
                }


                target = upTo + 1;


                moveToNextBlock(target);
              }
            }

Then in moveToNextBlock this executes the ES812ScoreSkipReader impacts check and possibly, this adversely sets the target resulting in a loop.

Steps to Reproduce

@softwaredoug discovered this, I will defer to him.

Logs (if relevant)

/tmp/jstack.4.log:"elasticsearch[eck-elasticsearch-es-default-5][search_worker][T#2]" #78 [163] daemon prio=5 os_prio=0 cpu=771958.29ms elapsed=13461.64s tid=0x00007f5f28013750 nid=163 runnable  [0x00007f5e5d1fd000]
/tmp/jstack.4.log-   java.lang.Thread.State: RUNNABLE
/tmp/jstack.4.log-	at org.apache.lucene.search.DisjunctionScoreBlockBoundaryPropagator.advanceShallow([email protected]/DisjunctionScoreBlockBoundaryPropagator.java:79)
/tmp/jstack.4.log-	at org.apache.lucene.search.DisjunctionMaxScorer.advanceShallow([email protected]/DisjunctionMaxScorer.java:79)
/tmp/jstack.4.log-	at org.apache.lucene.search.ConjunctionScorer.advanceShallow([email protected]/ConjunctionScorer.java:80)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer.advanceShallow([email protected]/ReqOptSumScorer.java:274)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.moveToNextBlock([email protected]/ReqOptSumScorer.java:82)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.advanceImpacts([email protected]/ReqOptSumScorer.java:106)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.advanceInternal([email protected]/ReqOptSumScorer.java:129)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.nextDoc([email protected]/ReqOptSumScorer.java:112)
/tmp/jstack.4.log-	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange([email protected]/Weight.java:298)
/tmp/jstack.4.log-	at org.apache.lucene.search.Weight$DefaultBulkScorer.score([email protected]/Weight.java:236)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.CancellableBulkScorer.score([email protected]/CancellableBulkScorer.java:45)
/tmp/jstack.4.log-	at org.apache.lucene.search.BulkScorer.score([email protected]/BulkScorer.java:38)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf([email protected]/ContextIndexSearcher.java:436)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher.search([email protected]/ContextIndexSearcher.java:365)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$3([email protected]/ContextIndexSearcher.java:350)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda/0x00007f5fec574000.call([email protected]/Unknown Source)
/tmp/jstack.4.log-	at org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0([email protected]/TaskExecutor.java:117)
/tmp/jstack.4.log-	at org.apache.lucene.search.TaskExecutor$TaskGroup$$Lambda/0x00007f5fec519c80.call([email protected]/Unknown Source)
/tmp/jstack.4.log-	at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:317)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions