Skip to content

ESQL: Wrong STATS result in case of many distinct groups of 2 fields #130644

@alex-spies

Description

@alex-spies

STATS COUNT() BY field1, field2 can produce wrong results if the cardinality of field1 and field2 is very large.

Steps to repro:

  1. create files to bulk ingest
for i in {1..10}; do
  for j in {1..100000}; do                                                                                                             
    echo "{\"index\": {\"_index\": \"test2\"}}" >> bulk$i.ndjson
    echo "{\"field\": \"$i\", \"otherfield\": \"$j\", \"concat\": \"$i$j\"}" >> bulk$i.ndjson
  done
done

Note how the concat field is just the concatenation of the i and j counters.

  1. run ES locally via gradle, but increase heap size
./gradlew run -Dtests.es.logger.org.elasticsearch.xpack.esql=TRACE -Dtests.jvm.argline="-Xmx16000M"
  1. bulk ingest
for i in {1..10}; do
  curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/test2/_bulk?refresh" -XPOST --data-binary @bulk$i.ndjson
done
  1. run queries
curl -u elastic:password -X POST 'localhost:9200/_query?format=txt' -H 'Content-Type: application/json' '-d
{                                                                                                                                      
  "query": "from test2 | stats count() by concat.keyword, otherfield.keyword| where concat.keyword == \"11\""}
'

Empty! Although not fully deterministic.

    count()    |concat.keyword |otherfield.keyword
---------------+---------------+------------------

Correct if we filter before!
curl -u elastic:password -X POST 'localhost:9200/_query?format=txt' -H 'Content-Type: application/json' '-d
{
  "query": "from test2 | where concat.keyword == \"11\" | stats count() by concat.keyword, otherfield.keyword"}
'
    count()    |concat.keyword |otherfield.keyword
---------------+---------------+------------------
1              |11             |1 

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions