[ML] Improve how the inference API determines the elser model to use for endpoints #127284

jonathan-buttner · 2025-04-23T20:22:53Z

When creating an inference endpoint to leverage ELSER, the inference API will determine which model variant to use. To do this it retrieves information about the ML nodes and checks that they're all on the same hardware and which architecture they are using. Based on that information we either use the x86_64 variant or the platform agnostic variant.

There are a couple shortcomings with this:

If no ML nodes have to started yet we won't be able to determine the appropriate architecture
If the architecture changes the model will crash
Ideally the inference API would also handle choosing the right iteration version of the model (currently we use v2)

If the wrong model variant is chosen and it needs to be reevaluated, a workaround is to delete the inference endpoint (if it is the default inference endpoint that is ok too) and recreate it (in the case of the default endpoint it will automatically get recreated).

elasticsearchmachine · 2025-04-23T20:23:31Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner added :ml Machine learning Team:ML Meta label for the ML team Feature:GenAI Features around GenAI labels Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

jonathan-buttner commented Apr 23, 2025 •

edited

Loading

elasticsearchmachine commented Apr 23, 2025

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

Comments

jonathan-buttner commented Apr 23, 2025 • edited Loading

elasticsearchmachine commented Apr 23, 2025

jonathan-buttner commented Apr 23, 2025 •

edited

Loading