Skip to content

[ML] Custom Inference Service #125679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 51 commits into from

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Mar 26, 2025

See this PR: #127939

@elasticsearchmachine
Copy link
Collaborator

Hi @davidkyle, I've created a changelog YAML for you.

TimeValue timeout,
ActionListener<List<ChunkedInference>> listener
) {
listener.onFailure(new ElasticsearchStatusException("Chunking not supported by the {} service", RestStatus.BAD_REQUEST, NAME));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will see how to support this in the next PR, to support semantic_text

@davidkyle
Copy link
Member Author

@weizijun the Elasticsearch security team have advised against adding JsonPATH as a dependency, the last commit to the GitHub project was over a year ago and the project does not appear to be activity maintained. If a critical vulnerability was found in JsonPATH Elasticsearch would be exposed to it and there are no guarantees that the CVE would be fixed.

The team at Elastic considered using another Json path library but have decided to implement the features we need ourselves. The Elasticsearch code base already contains a lot of code for parsing JSON that we can use and writing our own implementation avoids adding another dependency.

@weizijun
Copy link
Contributor

@weizijun the Elasticsearch security team have advised against adding JsonPATH as a dependency, the last commit to the GitHub project was over a year ago and the project does not appear to be activity maintained. If a critical vulnerability was found in JsonPATH Elasticsearch would be exposed to it and there are no guarantees that the CVE would be fixed.

The team at Elastic considered using another Json path library but have decided to implement the features we need ourselves. The Elasticsearch code base already contains a lot of code for parsing JSON that we can use and writing our own implementation avoids adding another dependency.

Yeah, It's ok.

@@ -36,7 +36,7 @@ public abstract class BaseResponseHandler implements ResponseHandler {
public static final String METHOD_NOT_ALLOWED = "Received a method not allowed status code";

protected final String requestType;
private final ResponseParser parseFunction;
protected final ResponseParser parseFunction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this available so the custom response handler can immediately return on a parse failure instead of retrying.

private static final LazyInitializable<InferenceServiceConfiguration, RuntimeException> configuration = new LazyInitializable<>(
() -> {
var configurationMap = new HashMap<String, SettingsConfiguration>();
// TODO revisit this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to create some more complex configuration types to support the fields (like maps, lists of lists etc). Maybe for now we don't expose this in the services API?


Map<String, Object> headers = extractOptionalMap(map, HEADERS, ModelConfigurations.SERVICE_SETTINGS, validationException);
removeNullValues(headers);
var stringHeaders = validateMapStringValues(headers, HEADERS, validationException, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should limit the values in the header map to only strings.

removeNullValues(parameters);
validateMapValues(
parameters,
List.of(String.class, Integer.class, Double.class, Float.class, Boolean.class),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restricting the task settings to these types (no nested fields aka maps or lists).

public static final String QUERY_PARAMETERS = "query_parameters";

public static QueryParameters fromMap(Map<String, Object> map, ValidationException validationException) {
List<Tuple<String, String>> queryParams = extractOptionalListOfStringTuples(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query parameters can have duplicate keys which is why I'm not using a map here.

uri = buildUri();
}

private static void addStringParams(Map<String, String> stringParams, Map<String, ?> paramsToAdd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields like the url, query parameters, and headers should not have their values converted to json format. This only accepts strings and doesn't manipulate them.

}
}

private static void addJsonStringParams(Map<String, String> jsonStringParams, Map<String, ?> params) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields like the request body need to be a valid json object so we'll convert the values into json

import java.io.IOException;
import java.util.Objects;

public class SerializableSecureString implements ToXContentFragment, Writeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to serialize the api key or some secrets to the body of a request this class will make that process a little easier by implementing toXContent()

import static org.hamcrest.Matchers.is;
import static org.mockito.Mockito.mock;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is an attempt to push a lot of the duplicate logic in the inference service tests into a central place. If we create more services we should leverage this base class to remove the copy/paste.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review May 2, 2025 21:05
@jonathan-buttner jonathan-buttner added Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 labels May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@@ -97,7 +105,14 @@ public ErrorResponse apply(HttpResult httpResult) {
var errorText = toType(MapPathExtractor.extract(map, messagePath).extractedObject(), String.class, messagePath);
return new ErrorResponse(errorText);
} catch (Exception e) {
// swallow the error
logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a scenario where azure openai didn't return a json response. Normally if that happens we'd swallow the parse error but wouldn't report anything useful back. With this change we'll log the parse failure. The error parsing logic should only be called if we receive a failure status code. If many requests fail, and we are unable to parse the error we could log many errors.

@jonathan-buttner jonathan-buttner force-pushed the custom-inference-service branch from 1ba25d3 to 6492cd7 Compare May 8, 2025 19:40
@jonathan-buttner jonathan-buttner requested review from a team as code owners May 8, 2025 19:40
@jonathan-buttner
Copy link
Contributor

I'll open a different PR that doesn't have 500+ changes 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants