-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Re-evaluate the ML node memory avalability formula #126535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Example 1:
Example 2:
|
I was able to reproduce this error using this notebook: https://colab.research.google.com/drive/1mmB1adtRTpmdwtbiw9SXCQoWAzwELbOr#scrollTo=mGr_pki7eX1w |
Currently, if
ml.use_auto_machine_memory_percent
is set totrue
, the amount of available memory on an ML node is calculated asNODE_MEMORY - JVM_HEAP_SIZE - 200MB OFF-HEAP MEMORY
Where
JVM_HEAP_SIZE
is configured on ES start, and the off-heap memory is estimated at 200MB as a fixed value.Some empirical evidence suggests that the off-heap memory can be significantly larger, which can lead to the Java process being killed by the OOM-killer.
We need to re-evaluate whether the way the ML code is becoming aware of the available memory needs to be adjusted or changed.
The text was updated successfully, but these errors were encountered: