Skip to content

[ML] Implement JSONPath replacement for Inference API #127036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Apr 17, 2025

This PR adds a very minimal implementation of something similar to the JSONPath library. This is needed for the custom models PR here: #125679 MapPathExtractor recursively iterates through a provided map navigating to the specified field and extracting the values. It handles nested maps and lists within the map.

This code isn't currently used anywhere outside of the tests that reference it.

Difference from JSONPath

This implementation doesn't support many of the features that JSONPath does. It also deviates from JSONPath in its handling of arrays of maps. When extracting a field from a list of maps, JSONPath will flatten the result into a single array even if multiple arrays needed to be traversed to extract the field from the map. This implementation preserves each array that it encounters. This is important so that we can construct internal classes that represent the various result formats. For example when building the text embedding response we effectively need an array of an array of floats so it's helpful to preserve the outer and inner arrays when constructing the objects after we extract the data from the map.

The second example below depicts the difference.

Schema examples

I tried to keep the schema similar to JSONPath. There's no particular reason we need to do that though. Hopefully it's more familiar to users though.

  • $. to start the path
  • A . dot is used to traverse nested maps
  • [*] to indicate that it's an array

$.field1.some_array[*].another_field
$.some_array[*].field1

Examples

Extracting arrays
{
    "request_id": "B4AB89C8-B135-xxxx-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "token_count": 3072
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    2,
                    4
                ]
            },
            {
                "index": 1,
                "embedding": [
                    1,
                    2
                ]
            }
        ]
    }
}

MapPathExtractor.extract(map, "$.result.embeddings[*].embedding") returns [[2, 4], [1, 2]]

Extracting multiple map fields
{
  "result": [
    {
      "key": [
        {
          "a": 1.1
        },
        {
          "a": 2.2
        }
      ]
    },
    {
      "key": [
        {
          "a": 3.3
        },
        {
          "a": 4.4
        }
      ]
    }
  ]
}

MapPathExtractor.extract(map, "$.result[*].key[*].a"); returns [[1.1, 2.2], [3.3, 4.4]]

NOTE: JSONPath will return: [1.1, 2.2, 3.3, 4.4]

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Apr 17, 2025
@jonathan-buttner jonathan-buttner marked this pull request as ready for review April 18, 2025 12:25
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

* Uses a subset of the JSONPath schema to extract fields from a map.
* For more information <a href="https://en.wikipedia.org/wiki/JSONPath">see here</a>.
*
* This implementation differs in out it handles lists in that JSONPath will flatten inner lists. This implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This implementation differs in out it handles lists in that JSONPath will flatten inner lists. This implementation
* This implementation differs in how it handles lists in that JSONPath will flatten inner lists. This implementation

var cleanedPath = path.trim();

// Remove the prefix if it exists
if (cleanedPath.startsWith(DOLLAR)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to assert or throw an exception if we don't start with $?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I'll add an exception.

@jonathan-buttner jonathan-buttner merged commit 3156cc7 into elastic:main Apr 18, 2025
16 of 17 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-custom-model-json-paths branch April 18, 2025 18:34
jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Apr 18, 2025
* Adding initial extractor

* Finishing tests

* Addressing feedback
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Apr 18, 2025
)

* Adding initial extractor

* Finishing tests

* Addressing feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants