Allow timeout during trained model download process #129003

dan-rubinstein · 2025-06-05T17:45:18Z

Description

We currently allow users to provide a timeout during inference endpoint creation and when performing an inference request. When creating an endpoint requiring a trained model deployment to be started or performing an inference request to a default endpoint that does not have a trained model deployment started we will download the model before starting a deployment if it has not been previously downloaded. During this download process, we do not currently timeout if the user's requested timeout is exceeded and instead download the model fully and then timeout during the model deployment starting process. This change fixes this poor experience and allows the system to timeout during the model download. If this timeout occurs, we should still retain the experience that the model will be downloaded and a trained model deployment will be started in the background so the user does not have to take any further action for the process to complete.

Testing

Tested that locally creating an ElasticsearchInternalService endpoint with a small timeout (1 second) will throw the ModelDeploymentTimeoutException and will complete the download/deployment start asynchronously.
Tested that calling inference on a default endpoint with no model downloaded/no trained model deployment started has the same experience as the test above.
Should we have some QA tests or IT tests for this?

TODO: Test what happens when an inference endpoint is created with a short timeout. It still downloads the model, creates the endpoint, and starts the deployment deployment but the error message is confusing as it tells the user to try again.

elasticsearchmachine · 2025-06-05T17:45:44Z

Hi @dan-rubinstein, I've created a changelog YAML for you.

dan-rubinstein · 2025-06-06T15:39:17Z

@elasticmachine merge upstream

elasticsearchmachine · 2025-06-06T17:27:11Z

Pinging @elastic/ml-core (Team:ML)

Allow timeout during trained model download process

e4a7481

dan-rubinstein added >bug :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0 labels Jun 5, 2025

Update docs/changelog/129003.yaml

1af3137

github-actions bot deployed to docs-preview June 5, 2025 17:46 View deployment

Merge branch 'main' into timeout-during-model-download

2edafbf

github-actions bot deployed to docs-preview June 6, 2025 15:40 View deployment

dan-rubinstein marked this pull request as ready for review June 6, 2025 17:26

jonathan-buttner approved these changes Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow timeout during trained model download process #129003

Allow timeout during trained model download process #129003

dan-rubinstein commented Jun 5, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

dan-rubinstein commented Jun 6, 2025

Uh oh!

elasticsearchmachine commented Jun 6, 2025

Uh oh!

Uh oh!

Allow timeout during trained model download process #129003

Are you sure you want to change the base?

Allow timeout during trained model download process #129003

Conversation

dan-rubinstein commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

dan-rubinstein commented Jun 6, 2025

Uh oh!

elasticsearchmachine commented Jun 6, 2025

Uh oh!

Uh oh!

dan-rubinstein commented Jun 5, 2025 •

edited

Loading