Skip to content

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jonathan-buttner opened this issue Apr 23, 2025 · 1 comment
Labels
Feature:GenAI Features around GenAI :ml Machine learning Team:ML Meta label for the ML team

Comments

@jonathan-buttner
Copy link
Contributor

jonathan-buttner commented Apr 23, 2025

When creating an inference endpoint to leverage ELSER, the inference API will determine which model variant to use. To do this it retrieves information about the ML nodes and checks that they're all on the same hardware and which architecture they are using. Based on that information we either use the x86_64 variant or the platform agnostic variant.

There are a couple shortcomings with this:

  • If no ML nodes have to started yet we won't be able to determine the appropriate architecture
  • If the architecture changes the model will crash
  • Ideally the inference API would also handle choosing the right iteration version of the model (currently we use v2)

If the wrong model variant is chosen and it needs to be reevaluated, a workaround is to delete the inference endpoint (if it is the default inference endpoint that is ok too) and recreate it (in the case of the default endpoint it will automatically get recreated).

@jonathan-buttner jonathan-buttner added :ml Machine learning Team:ML Meta label for the ML team Feature:GenAI Features around GenAI labels Apr 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:GenAI Features around GenAI :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

2 participants