Skip to content

ES|QL - Allow full text functions to be used in STATS ... WHERE #125479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Mar 24, 2025

Allows full text functions to be used in STATS ... WHERE:

FROM books 
| STATS c = count(*) where author:"tolkien"

Just filters work for now - we could expand this into BY for stats in a follow up.

Closes #125481

@carlosdelest carlosdelest added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed v9.1.0 labels Mar 24, 2025
@@ -54,6 +54,7 @@ public record ShardConfig(Query query, IndexSearcher searcher) {}
private final List<ShardState> perShardState;

protected LuceneQueryEvaluator(BlockFactory blockFactory, ShardConfig[] shards) {
assert shards.length > 0 : "LuceneQueryEvaluator expects shard configs";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will help detect non-pushable expressions that need to be created with ShardConfig information

@carlosdelest carlosdelest marked this pull request as ready for review March 24, 2025 13:54
@elasticsearchmachine elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Mar 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, but was wondering about an if'ace update.

}
for (Expression child : expression.children()) {
if (child instanceof FullTextFunction ftf) {
failures.add(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could maybe add a method that builds the failure message and reuse it above as well.

@@ -21,6 +23,7 @@ interface PhysicalOperationProviders {
PhysicalOperation groupingPhysicalOperation(
AggregateExec aggregateExec,
PhysicalOperation source,
LocalExecutionPlannerContext context
LocalExecutionPlannerContext context,
List<EsPhysicalOperationProviders.ShardContext> shardContexts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's feels a bit odd to provide an implementation piece of EsPhysicalOperationProviders (in the list) to the interface, given that actually EsPhysicalOperationProviders, which implements this if'ace, already has the list of shardContexts. I wonder if we could embbed it into the LocalExecutionPlannerContext itself, since it's local execution specific anyways.
Maybe it should be passed it to LocalExecutionPlaner#plan from ComputeService#runCompute, where the list of shard contexts is already available?

Copy link
Member Author

@carlosdelest carlosdelest Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I did that change in 2a8e987, and some follow up work in c842f14

@carlosdelest carlosdelest added auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Mar 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

…-search-functions-stats

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…ch-functions-stats' into enhancement/esql-text-search-functions-stats
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carlosdelest! I added a couple of comments related to the test coverage and messages.


from books
| where length(title) > 40
| stats c = count(*) where match(title, "Lord")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions on the test coverages that I can think of, these also apply to the other full text functions and operators:

  1. Can we have some tests with a bit more complicated predicates in the where clause under stats? For example some combinations of and, or and not, with functions that can or cannot be pushed down to Lucent? Perhaps borrow some queries from the existing tests where the where clause is not under stats.
  2. I wonder how score works with where under stats, can we have some tests to capture how it behaves?
  3. Can we have some tests to cover multiple aggregate functions with where clause under stats? The predicates for each aggregation can have some overlaps, we have some optimization rules to deal with overlapped predicates under stats.
  4. Can we have some test with aggregation and grouping(BY) with full text functions under where clause?
  5. Add some full text functions with options for the completeness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some tests with a bit more complicated predicates in the where clause under stats? For example some combinations of and, or and not, with functions that can or cannot be pushed down to Lucent? Perhaps borrow some queries from the existing tests where the where clause is not under stats.
Can we have some tests to cover multiple aggregate functions with where clause under stats? The predicates for each aggregation can have some overlaps, we have some optimization rules to deal with overlapped predicates under stats.
Can we have some test with aggregation and grouping(BY) with full text functions under where clause?
Add some full text functions with options for the completeness.

I added testing in 3963fd3, hopefully that works!

I wonder how score works with where under stats, can we have some tests to capture how it behaves?

It affects scoring as well. I'm thinking that

FROM my_index METADATA _score
| WHERE match(field, "query")
| STATS c = AVG(_score)

is the same as

FROM my_index METADATA _score
| STATS c = AVG(_score) WHERE match(field, "query")

so using a FTF in a STATS WHERE clause affects scoring as well. I think it's better for consistency, but happy to discuss with the team.

} else {
plan.forEachExpression(FullTextFunction.class, ftf -> {
failures.add(fail(ftf, "[{}] {} is only supported in WHERE commands", ftf.functionName(), ftf.functionType()));
failures.add(
fail(ftf, "[{}] {} is only supported in WHERE and STATS ... WHERE commands", ftf.functionName(), ftf.functionType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STATS ... WHERE commands looks a bit weird to me, especially the ..., but I also had hard time thinking of a good way to describe this situation, some candidates that I can think of are:

{} is only supported in WHERE commands   ===> implies standalone or nested WHERE command
{} is only supported as filters in WHERE commands 
{} is only supported as predicates in WHERE commands 
{} is only supported in WHERE commands and WHERE commands under STATS commands
{} is only supported as filters in WHERE and STATS commands 
{} is only supported as predicates in WHERE and STATS commands 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is only supported as filters in WHERE and STATS commands works for me - changed it in 02f8748

# Conflicts:
#	x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneQueryEvaluator.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/180_match_operator.yml
…-search-functions-stats

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
@carlosdelest
Copy link
Member Author

@fang-xing-esql @bpintea I added some testing to this PR that had fallen off my priority list for a while. Can you please review it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES|QL - Allow full text functions to be used in STATS ... WHERE
4 participants