-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES|QL - Allow full text functions to be used in STATS ... WHERE #125479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ES|QL - Allow full text functions to be used in STATS ... WHERE #125479
Conversation
@@ -54,6 +54,7 @@ public record ShardConfig(Query query, IndexSearcher searcher) {} | |||
private final List<ShardState> perShardState; | |||
|
|||
protected LuceneQueryEvaluator(BlockFactory blockFactory, ShardConfig[] shards) { | |||
assert shards.length > 0 : "LuceneQueryEvaluator expects shard configs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will help detect non-pushable expressions that need to be created with ShardConfig information
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good, but was wondering about an if'ace update.
} | ||
for (Expression child : expression.children()) { | ||
if (child instanceof FullTextFunction ftf) { | ||
failures.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could maybe add a method that builds the failure message and reuse it above as well.
@@ -21,6 +23,7 @@ interface PhysicalOperationProviders { | |||
PhysicalOperation groupingPhysicalOperation( | |||
AggregateExec aggregateExec, | |||
PhysicalOperation source, | |||
LocalExecutionPlannerContext context | |||
LocalExecutionPlannerContext context, | |||
List<EsPhysicalOperationProviders.ShardContext> shardContexts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's feels a bit odd to provide an implementation piece of EsPhysicalOperationProviders (in the list) to the interface, given that actually EsPhysicalOperationProviders
, which implements this if'ace, already has the list of shardContexts
. I wonder if we could embbed it into the LocalExecutionPlannerContext
itself, since it's local execution specific anyways.
Maybe it should be passed it to LocalExecutionPlaner#plan
from ComputeService#runCompute
, where the list of shard contexts is already available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @carlosdelest, I've created a changelog YAML for you. |
…-search-functions-stats # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…ch-functions-stats' into enhancement/esql-text-search-functions-stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest! I added a couple of comments related to the test coverage and messages.
|
||
from books | ||
| where length(title) > 40 | ||
| stats c = count(*) where match(title, "Lord") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some suggestions on the test coverages that I can think of, these also apply to the other full text functions and operators:
- Can we have some tests with a bit more complicated predicates in the
where
clause understats
? For example some combinations ofand
,or
andnot
, with functions that can or cannot be pushed down to Lucent? Perhaps borrow some queries from the existing tests where thewhere
clause is not understats
. - I wonder how score works with
where
understats
, can we have some tests to capture how it behaves? - Can we have some tests to cover multiple aggregate functions with
where
clause understats
? The predicates for each aggregation can have some overlaps, we have some optimization rules to deal with overlapped predicates understats
. - Can we have some test with aggregation and grouping(BY) with full text functions under
where
clause? - Add some full text functions with options for the completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some tests with a bit more complicated predicates in the where clause under stats? For example some combinations of and, or and not, with functions that can or cannot be pushed down to Lucent? Perhaps borrow some queries from the existing tests where the where clause is not under stats.
Can we have some tests to cover multiple aggregate functions with where clause under stats? The predicates for each aggregation can have some overlaps, we have some optimization rules to deal with overlapped predicates under stats.
Can we have some test with aggregation and grouping(BY) with full text functions under where clause?
Add some full text functions with options for the completeness.
I added testing in 3963fd3, hopefully that works!
I wonder how score works with where under stats, can we have some tests to capture how it behaves?
It affects scoring as well. I'm thinking that
FROM my_index METADATA _score
| WHERE match(field, "query")
| STATS c = AVG(_score)
is the same as
FROM my_index METADATA _score
| STATS c = AVG(_score) WHERE match(field, "query")
so using a FTF in a STATS WHERE clause affects scoring as well. I think it's better for consistency, but happy to discuss with the team.
} else { | ||
plan.forEachExpression(FullTextFunction.class, ftf -> { | ||
failures.add(fail(ftf, "[{}] {} is only supported in WHERE commands", ftf.functionName(), ftf.functionType())); | ||
failures.add( | ||
fail(ftf, "[{}] {} is only supported in WHERE and STATS ... WHERE commands", ftf.functionName(), ftf.functionType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STATS ... WHERE commands
looks a bit weird to me, especially the ...
, but I also had hard time thinking of a good way to describe this situation, some candidates that I can think of are:
{} is only supported in WHERE commands ===> implies standalone or nested WHERE command
{} is only supported as filters in WHERE commands
{} is only supported as predicates in WHERE commands
{} is only supported in WHERE commands and WHERE commands under STATS commands
{} is only supported as filters in WHERE and STATS commands
{} is only supported as predicates in WHERE and STATS commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is only supported as filters in WHERE and STATS commands
works for me - changed it in 02f8748
# Conflicts: # x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneQueryEvaluator.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/180_match_operator.yml
…-search-functions-stats # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
@fang-xing-esql @bpintea I added some testing to this PR that had fallen off my priority list for a while. Can you please review it? |
Allows full text functions to be used in STATS ... WHERE:
Just filters work for now - we could expand this into BY for stats in a follow up.
Closes #125481