Skip to content

ES|QL cross-cluster searches honor the skip_unavailable cluster setting #112886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
quux00 opened this issue Sep 13, 2024 · 11 comments
Closed

ES|QL cross-cluster searches honor the skip_unavailable cluster setting #112886

quux00 opened this issue Sep 13, 2024 · 11 comments
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@quux00
Copy link
Contributor

quux00 commented Sep 13, 2024

UPDATE: This issue had a lot of discussion to work through the approach. A new issue has been created that summarizes the proposed approaches to handling skip_unavailable in ES|QL.

Description

Overview

The skip_unavailable remote cluster setting is intended to allow ES admins to specify whether a cross-cluster search should fail or return partial data in the face of a errors on a remote cluster during a cross-cluster search.

For _search, if skip_unavailable is true, a cross-cluster search:

  • Skips the remote cluster if its nodes are unavailable during the search. The response’s _clusters.skipped value contains a count of any skipped clusters and the _clusters.details section of the response will show a skipped status.
  • Errors returned by the remote cluster, such as unavailable shards, no matching indices, etc. are not fatal. The search will continue and return results from other clusters.

ESQL cross-cluster searches should also respect this setting, but we need to define exactly how it should work.

Proposed Implementation in ES|QL. Phase 1: field-caps and enrich policy-resolve APIs

To start, support for skip_unavailable should be implemented in both the field-caps and enrich policy-resolve APIs, which occur as part of the "pre-analysis" phase of ES|QL processing.

When a remote cluster cannot be connected to during the field-caps or enrich policy-resolve steps:

  • if skip_unavailable=true (the default setting) for the remote cluster, the cluster will be marked as SKIPPED in the EsqlExcecutionInfo metadata object for that search and reported as skipped in the _clusters/details metadata section of the ES|QL response and a failure reason will be provided (see examples section below).
  • if skip_unavailable=false, then a 500 HTTP status code is returned along with a single top level error, as _search does.

If the index expression provided does not match any indices on a cluster, how should we handle that? I propose that we follow the pattern in _search:

  • if skip_unavailable=true (the default setting) for the remote cluster, the cluster will be marked as SKIPPED along with a "index_not_found" failure message
  • if skip_unavailable=false for the remote cluster, the cluster will be marked as SKIPPED along with a "index_not_found" failure message, ONLY IF the index expression was specified with a wildcard (lenient handling) - see example below
  • if skip_unavailable=false for the remote cluster, and a concrete index was specified by the client (no wildcards), the error is fatal and HTTP 500 status will be returned with "index_not_found" failure message (see example below)

An additional consideration is how to treat the local cluster. It does not have an explicit skip_unavailable setting. Since it is the coordinating cluster, it will never be unavailable, but we need to decide how to handle the case when the index expression provided matches no indices - is it a fatal error or should we just mark the local cluster as skipped?

I propose that we treat the local cluster like skip_unavailable=true for this case, for three reasons:

  1. users cannot change the skip_unavailable setting for the local cluster
  2. skip_unavailable=true is the default for remote clusters, so it should be the default for local also
  3. this behavior is consistent with how ES|QL currently behaves. (Right now, as long as one cluster has matching indices, the search will proceed and return data from the clusters with matching indices.)

Reference

I have documented how _search and ES|QL currently behaves with respect to indices not matching here: https://gist.github.com/quux00/a1256fd43947421e3a6993f982d065e8

Examples of proposed ES|QL response outputs

In these examples:

remote1 has skip_unavailable=true
remote2 has skip_unavailable=false

(toggle) Fatal error (404) when index not found and wildcard NOT used in remote2
POST /_query/async
{
  "query": "FROM *,remote1:x,remote2:x|\n STATS count(*) by authors.last_name | LIMIT 4"
}

// response
{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index [x] and No matching index for [x] was found on [remote2] cluster (which has skip_unavailable=false)",
        "index_uuid": "_na_",
        "index": "x"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index [x] and No matching index for [x] was found on [remote2] cluster (which has skip_unavailable=false)",
    "index_uuid": "_na_",
    "index": "x"
  },
  "status": 404
}
(toggle) Skipped clusters when index not found and wildcard used in remote2
POST /_query/async?drop_null_columns
{
  "query": "FROM *,remote1:x,remote2:x*|\n STATS count(*) by authors.last_name | LIMIT 4"
}

// response
  "_clusters": {
    "total": 3,
    "successful": 1,
    "running": 0,
    "skipped": 2,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "successful",
        "indices": "*",
        "took": 50,
        "_shards": {
          "total": 21,
          "successful": 21,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote2": {
        "status": "skipped",
        "indices": "x*",
        "took": 0,
        "_shards": {
          "total": 0,
          "successful": 0,
          "skipped": 0,
          "failed": 0
        },
        "failures": [
          {
            "shard": -1,
            "index": null,
            "reason": {
              "type": "index_not_found_exception",
              "reason": "no such index [x*]",
              "index_uuid": "_na_",
              "index": "x*"
            }
          }
        ]
      },
      "remote1": {
        "status": "skipped",
        "indices": "x",
        "took": 0,
        "_shards": {
          "total": 0,
          "successful": 0,
          "skipped": 0,
          "failed": 0
        },
        "failures": [
          {
            "shard": -1,
            "index": null,
            "reason": {
              "type": "index_not_found_exception",
              "reason": "no such index [x]",
              "index_uuid": "_na_",
              "index": "x"
            }
          }
        ]
      }
    }
  }
(toggle) Skipped cluster remote1 when not available
  "_clusters": {
    "total": 3,
    "successful": 2,
    "running": 0,
    "skipped": 1,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "successful",
        "indices": "*",
        "took": 181,
        "_shards": {
          "total": 21,
          "successful": 21,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote2": {
        "status": "successful",
        "indices": "*",
        "took": 180,
        "_shards": {
          "total": 12,
          "successful": 12,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote1": {
        "status": "skipped",
        "indices": "*",
        "failures": [
          {
            "shard": -1,
            "index": null,
            "reason": {
              "type": "connect_transport_exception",
              "reason": "Unable to connect to [remote1]"
            }
          }
        ]
      }
    }
  }
(toggle) Fatal error since remote2 not available
{
  "error": {
    "root_cause": [
      {
        "type": "uncategorized_execution_exception",
        "reason": "Failed execution"
      }
    ],
    "type": "connect_transport_exception",
    "reason": "[][127.0.0.1:9302] connect_exception",
    "caused_by": {
      "type": "uncategorized_execution_exception",
      "reason": "Failed execution",
      "caused_by": {
        "type": "execution_exception",
        "reason": "io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:9302",
        "caused_by": {
          "type": "annotated_connect_exception",
          "reason": "Connection refused: /127.0.0.1:9302",
          "caused_by": {
            "type": "connect_exception",
            "reason": "Connection refused"
          }
        }
      }
    }
  },
  "status": 500
}

Proposed Implementation in ES|QL. Phase 2: trapping errors during ES|QL operations (after planning)

To be fully compliant with the skip_unavailable model, we will also need to add in error handling during ES|QL processing. If shard errors (or other fatal errors) occur during ES|QL processing on a remote cluster and that cluster is marked as skip_unavailable=true, we will need to trap those errors, avoid returning a 4xx/5xx error to the user and instead mark the cluster either as skipped or partial (depending on whether we can use the partial data that came back) in the EsqlExecutionInfo, along with failure info, as we do in _search.

Since ES|QL currently treats failures during ES|QL processing as fatal, I do not know how hard adding this feature will be. I would like feedback from the ES|QL team on how feasible this is and how it could be done.

@quux00 quux00 added :Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Sep 13, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented Oct 1, 2024

If the index expression provided does not match any indices on a cluster, how should we handle that?

None of this sounds like it's actively against the ESQL philosophy. Its a little weird that you can request an index explicitly (foo:x) an it's not an error sometimes. It's also interesting that this is so far from the query - but I understand how we got here. I suppose the big questions is - what does kibana do when it sees these? What should it do? A warning?

Since ES|QL currently treats failures during ES|QL processing as fatal, I do not know how hard adding this feature will be. I would like feedback from the ES|QL team on how feasible this is and how it could be done.

It's been quite a while since I looked at that code, but I expect we can rig up something. I'm sure testing it's going to be fun though. As with _search we could generate wildly inaccurate results if there's an error while the thing is running, but that's what you ask for when you set skip_unavailable to true I think.

@quux00 quux00 changed the title ES|QL cross-cluster searches honors the skip_unavailable cluster setting ES|QL cross-cluster searches honor the skip_unavailable cluster setting Oct 2, 2024
@astefan
Copy link
Contributor

astefan commented Oct 4, 2024

@quux00, first let me picture how ES|QL behaves in different scenarios of indices missing or not.
My view of the language behavior has two parts, regarding the availability of indices:

  • the IndexResolver (and its _field_caps call) learns the mappings that are involved in the query with the main goal to have the rest of the planning components (Analyzer and Verifier) be able to validate the correctness of the query. This means the data types involved in different operations (math operators, scalar functions, aggregation functions etc) are compatible, all involved fields do actually exist in the mappings, any wildcards used the query can actually be expanded (here I refer to keep *name, drop *name commands).
  • after all the planner components are done (Analyzer, Verifier, LogicalPlanOptimizer, PhysicalPlanOptimizer, and the per-data-node counterparts) the actual search happens (Drivers and Operators are created, they are scheduled and start reading and computing data)

Existent behavior for different index patterns:

index pattern behavior
from existent, inexistent failure coming from ES at query time complaining about "inexistent"
from existent, inexistent* no failure, ES does the expansion of the wildcard but doesn't complain
from inexistent failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
from inexistent* failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
from existent no failure

The way I see this in the context of CCS and skip_unavailable

skip_unavailable = true

index pattern behavior
from remote:existent, inexistent failure coming from ES at query time complaining about "inexistent" from the "local" cluster
from existent, remote:inexistent no failure. This is different from what we have now (where _search query DSL complains);
it's fair to assume that this is the remote cluster's behavior that is in control;
If we are to override this, then skip_unavailable is void of its intention
from remote:existent, inexistent* no failure, ES does the expansion of the wildcard but doesn't complain for its "local" indices
from existent, remote:inexistent* no failure, ES does the expansion of the wildcard but doesn't complain for its "remote" indices
from remote:inexistent failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
from remote:inexistent* failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
from remote:existent no failure
from inexistent*, remote:inexistent failure coming from post-_field_caps check, at the Verifier level
from inexistent*, remote:inexistent* failure coming from post-_field_caps check, at the Verifier level
from inexistent, remote:inexistent failure coming from post-_field_caps check, at the Verifier level
from inexistent, remote:inexistent* failure coming from post-_field_caps check, at the Verifier level

skip_unavailable = false

index pattern behavior
from remote:existent, inexistent failure coming from ES at query time complaining about "inexistent" from the "local" cluster
from existent, remote:inexistent failure coming from ES (not _field_caps) at query time complaining about "inexistent" in the remote index and skip_unavailable=false
from remote:existent, inexistent* no failure, ES does the expansion of the wildcard but doesn't complain for its "local" indices
from existent, remote:inexistent* failure at query time (not _field_caps), ES does the expansion of the wildcard and because skip_unavailable = false, it complains;
This is different than the current ESQL behavior (where we don't fail)
from remote:inexistent failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
from remote:inexistent* failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
from remote:existent no failure
from inexistent*, remote:inexistent failure coming from post-_field_caps check, at the Verifier level
from inexistent*, remote:inexistent* failure coming from post-_field_caps check, at the Verifier level
from inexistent, remote:inexistent failure coming from post-_field_caps check, at the Verifier level
from inexistent, remote:inexistent* failure coming from post-_field_caps check, at the Verifier level

Note: for cases where the local cluster is involved and the overall - remote and local - resolution of indices ends up with no indices this should be an error because the user specified a missing indices pattern for the local cluster; even if the user cannot control how the remote clusters behave, the user is in control of the index pattern for the local cluster

CC @nik9000 @costin

@quux00
Copy link
Contributor Author

quux00 commented Oct 4, 2024

Thanks @astefan for the careful review and analysis!

One nit to start, when you say:

failure coming from ES at _search time complaining about "inexistent"

Can we change that to say "query time" rather than "_search time". I was confused if you were talking about the _search endpoint or ESQL. I think you are only talking about ES|QL?


Second, I think there are some errors in your table notes.

FROM remote:existent,inexistent
Result: failure coming from ES at _search time complaining about "inexistent" from the "local" cluster

This is not how ESQL currently behaves. It will ignore the local "inexistent" and search "remote:existent". That's true as long as you only search for one concrete index per cluster and at least one cluster has a matching index - in that case the search succeeds.

And

FROM existent,remote:inexistent
Result: no failure. This is different from what we have now (where _search complains);

That is how ESQL behaves now (meaning it currently does NOT throw a failure) for the same reason as above, so that is not a change.

Where ESQL is different (inconsistent?) is when you search for two concrete indexes on the same cluster and one exists and the other does not. Both of these fail with an index_not_found exception.

FROM existent,inexistent
FROM remote:existent,remote:inexistent

In my view we should make ESQL behavior consistent here AND tailor it to be specific to whether the cluster is skip_unavailable=true or false.

In other words, a missing concrete index should either:

  • always fail if skip_unavailable=false
  • always be skipped (non-fatal, just recorded in metadata) if skip_unavailable=true

And that means we need to determine what the skip_unavailable built-in "setting" for the local cluster is.


Third, from this example you gave

FROM remote:existent, inexistent
Result: failure coming from ES at _search time complaining about "inexistent" from the "local" cluster

I believe you are proposing that the local cluster should be treated like skip_unavailable=false. Is that right?

That would be consistent with how _search behaves (see below), so I'm not opposed to it, but I did propose the opposite in my write-up because skip_unavailable=true is now the default for remotes (that was changed recently, in 8.15, I think) and it is not a changeable setting by end users/admins.

Example showing that in _search, the local cluster is treated like skip_unavailable=false:

GET inexistent,remote1:*/_search
{
  "query": {
    "match_all": {}
  }
}
// fails with:

{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index [inexistent]",
...

For the rest of your write-up example, I tested them against my skip_unavailable branch in progress and they match the behavior I've implemented.

So I think we come down to two open questions:

  1. should the local cluster be treated like skip_unavailable true or false?
  2. should we change ESQL to be consistent for cases FROM existent,inexistent and FROM remote:existent,remote:inexistent? The proposal being - that should fail only if the cluster is skip_unavailable=false.

@quux00
Copy link
Contributor Author

quux00 commented Oct 4, 2024

UPDATE: Note also that ESQL is not currently consistent around whether to fail queries when two concrete indices are given for the same cluster, but one doesn't exist.

This fails (with index_not_found_exception):

FROM existent,inexistent| LIMIT 1"

But this succeeds:

FROM existent,inexistent| LIMIT 0"

The reason is the latter is a coordinator only operation and the index never needs to be used again the data node phase of ESQL processing.

My vote would be that both should behave the same and whether it fails or not depends on the skip_unavailable setting. In my current PR that's the behavior I've been working towards, so please let me know if anyone doesn't agree.

@astefan
Copy link
Contributor

astefan commented Oct 7, 2024

Can we change that to say "query time" rather than "_search time". I was confused if you were talking about the _search endpoint or ESQL. I think you are only talking about ES|QL?

You are right, sorry about abusing the _search term. I've edited my original post.

FROM remote:existent,inexistent
Result: failure coming from ES at _search time complaining about "inexistent" from the "local" cluster

This is not how ESQL currently behaves. It will ignore the local "inexistent" and search "remote:existent". That's true as long as you only search for one concrete index per cluster and at least one cluster has a matching index - in that case the search succeeds.

Imo, this is inconsistent and incorrect. If from existent, inexistent results in an error (no matter how it's exposed to users; agree though that it should be the same type of error) then why we make an exception for remote:existent presence? It's still an index. If it's remote or local, it shouldn't matter.

FROM existent,remote:inexistent
Result: no failure. This is different from what we have now (where _search complains);

That is how ESQL behaves now (meaning it currently does NOT throw a failure) for the same reason as above, so that is not a change.

I think it should (throw an error). This changes if we add the skip_unvailable variable in the equation, though.

Where ESQL is different (inconsistent?) is when you search for two concrete indexes on the same cluster and one exists and the other does > not. Both of these fail with an index_not_found exception.

FROM existent,inexistent
FROM remote:existent,remote:inexistent

I think this behavior is consistent: both queries should fail because one concrete index does not exist. It doesn't matter if the index is local or not. Of course, this can change depending on skip_unavailable, which is something that cannot be controlled when creating the ESQL query.

FROM remote:existent, inexistent
Result: failure coming from ES at _search time complaining about "inexistent" from the "local" cluster

I believe you are proposing that the local cluster should be treated like skip_unavailable=false. Is that right?

From what I understand regarding skip_unavailable is that the setting makes sense for remote clusters. I don't see why skip_unvailable should be considered for the local cluster though. from test1, test2 shouldn't have different behaviors if skip_unavailable is true or false. This would imply that skip_unvailable can be changed per query, which is not the case and many existent queries can fail because of that.

I think, better said, I regard the CCS handling of ES|QL having as a "baseline" comparison the current (non-CCS) behavior which has no knowledge of skip_unvailable. What we consider as new behavior in ES|QL with regards to CCS should be in addition to existent behavior and not as an override of current behavior.

But this succeeds:
FROM existent,inexistent| LIMIT 0

Good catch. And this is a bug.

My vote would be that both should behave the same and whether it fails or not depends on the skip_unavailable setting. In my current PR that's the behavior I've been working towards, so please let me know if anyone doesn't agree.

I respectfully disagree based on the logic I described above: skip_unavailable makes sense for remote clusters and considering it for the local one would:

  • break existent behavior
  • would make no sense for local-only index patterns, imo

@quux00
Copy link
Contributor Author

quux00 commented Oct 8, 2024

Thanks @astefan.

Reading through your feedback, I think the behavior you are proposing can be captured in one sentence:

If a user requests a concrete index that is not found, the query should be failed with a standard exception and HTTP status code, unless the query was done against a remote cluster with the setting skip_unavailable=true.


Discussion / further elucidation of this principle:

  1. Since the local cluster is never a remote cluster (and has no skip_unavailable setting), a missing concrete index is always fatal.

  2. Indices specified with a wildcard are lenient - no match is only fatal if no indices are found on any cluster.

  3. Thus, we need to fix the inconsistencies/bugs in ESQL currently so that the following queries fail (assuming that the remote cluster is skip_unavailable=false for these examples):

FROM existent,inexistent |  LIMIT 0
FROM remote:existent,inexistent
FROM existent,remote:inexistent

Open questions:

  1. Right now none of queries in 3 above return a fatal exception. If we make them fatal, is that a breaking change?

  2. What about missing aliases? Are those fatal? Or do we treat them leniently like wildcards?

  3. What if all cluster involved in the search are remote with skip_unavailable=true and none have any matching indices - is that is a fatal error or should it return a 200 with no data (and a list of failure reasons in the _clusters metadata)?

  4. What is the standard exception we should throw? IndexNotFoundException (404)? In some cases we get a VerificationException (400).

  5. Can we clarify what "CCS index resolution should be an extension of local resolution" means (this was stated to me by Costin in another forum).

  6. Do the indices_options settings make the principle outlined above more fuzzy, so it's not cut and dried simple?

@astefan
Copy link
Contributor

astefan commented Oct 9, 2024

Thanks @astefan.

Reading through your feedback, I think the behavior you are proposing can be captured in one sentence:

If a user requests a concrete index that is not found, the query should be failed with a standard exception and HTTP status code, unless the query was done against a remote cluster with the setting skip_unavailable=true.

Yes.

Discussion / further elucidation of this principle:

  1. Since the local cluster is never a remote cluster (and has no skip_unavailable setting), a missing concrete index is always fatal.

Just to make sure I understand what fatal means; sorry about double checking this, it is not a term I am used with. I am assuming this means that the search fails with "Unknown index" / "Index not found exception" 400/404 errors. If not correct, please let me know.

  1. Indices specified with a wildcard are lenient - no match is only fatal if no indices are found on any cluster.
  2. Thus, we need to fix the inconsistencies/bugs in ESQL currently so that the following queries fail (assuming that the remote cluster is skip_unavailable=false for these examples):
FROM existent,inexistent |  LIMIT 0
FROM remote:existent,inexistent
FROM existent,remote:inexistent

Open questions:

  1. Right now none of queries in 3 above return a fatal exception. If we make them fatal, is that a breaking change?

I don't think this is breaking. I consider the three queries above bugs.

  1. What about missing aliases? Are those fatal? Or do we treat them leniently like wildcards?

I don't think ES itself makes a difference between an alias and an index name. When writing an ESQL query or an ES _search DSL request the name of the resource provided can be an alias name, index name, or data stream name. Imo, aliases should be treated in the same way we treat indices.

  1. What if all cluster involved in the search are remote with skip_unavailable=true and none have any matching indices - is that is a fatal error or should it return a 200 with no data (and a list of failure reasons in the _clusters metadata)?

The query could be syntactically correct, (indices exist, field names and types are correct and compatible), or the query is syntactically incorrect (wrong index names), but there are other issues that forbid the remote cluster to fulfill the search.

Thus, I consider that this edge case should return no data and if to-be-added-parameter-that-exposes-ccs-metadata is set to true then the appropriate metadata will explain the reasons for the empty data set. Here, though, there is a longer discussion regarding the columns of the response, meaning when running the _field_caps call ESQL learns the mappings and then runs the actual search. If the failure to the remote cluster is at the _field_caps call (maybe connectivity issue) the response of the ESQL search should probably be empty rows and no columns (because the _field_caps call couldn't give us the fields name and their types). If the failure to the remote cluster is somewhere after the _field_caps call then the response should be no rows but with the proper columns.

What I'm trying to point out is that ES|QL could behave differently for the exact same query depending on several factors (connectivity to remote cluster(s), shards being active or not etc) and that is probably acceptable:

  • it could return 0 rows and 0 columns or
  • 0 rows and X columns (X > 0)
  1. What is the standard exception we should throw? IndexNotFoundException (404)? In some cases we get a VerificationException (400).

Right now, ES|QL has two flavors of "index not found" errors, as you mentioned:

  • IndexNotFoundException (404) that comes from ES itself at search time/while expanding the wildcard at search time/while Security expands the wildcard
  • VerificationException (400) that comes from the ES|QL's Verifier (this happens after the _field_caps call and after analysis, but before optimizer and planner)

I think we should be consistent here and, imo, this should be a 400 error code (bad request) - one that indicates that it's an user error where basically the query is syntactically incorrect because it references an inexistent index.

  1. Can we clarify what "CCS index resolution should be an extension of local resolution" means (this was stated to me by Costin in another forum).

I think @costin referred to my statement in the previous comment: I think, better said, I regard the CCS handling of ES|QL having as a "baseline" comparison the current (non-CCS) behavior which has no knowledge of skip_unvailable. What we consider as new behavior in ES|QL with regards to CCS should be in addition to existent behavior and not as an override of current behavior.

  1. Do the indices_options settings make the principle outlined above more fuzzy, so it's not cut and dried simple?

Yes, adding indices_options to an ESQL request will complicate things a bit, if this is what you meant. BUT, we had a sort of indices options built in the language, something like FROM index OPTIONS "preference"="_shards:1,2","allow_no_indices"="true", but we removed it to explore other alternatives.

@astefan
Copy link
Contributor

astefan commented Oct 10, 2024

@quux00 I've created #114495 for the limit 0 issue. Not trivial to fix.

@quux00
Copy link
Contributor Author

quux00 commented Oct 10, 2024

Just to make sure I understand what fatal means; sorry about double checking this, it is not a term I am used with. I am assuming this means that the search fails with "Unknown index" / "Index not found exception" 400/404 errors. If not correct, please let me know.

Yes. "Fatal" means an exception is thrown and results in a 4xx/5xx HTTP status response. I use this term because in _search some errors/exceptions are not fatal - they are just recorded either in logs or response metadata, but the response still has a 2xx. That will be true here in ESQL CCS as well for skip_unavailable in many/most cases, as discussed above.

Thanks for your detailed answers to the questions. I think we are on the same page now. I will create a new issue with a write-up of the forthcoming PRs for adding skip_unavailable support (there will be three PRs) that summarizes the plan.

@smalyshev
Copy link
Contributor

All the work for skip_unavailable is complete now, the rest is continuing in #122802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants