Skip to content

ESQL: TO_IP can handle leading zeros #126532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 11, 2025

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Apr 9, 2025

Modifies TO_IP so it can handle leading 0s in ipv4s. Here's how it works now:

ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails

This adds

ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})

We do this because there isn't a consensus on how to parse leading zeros in ipv4s. The standard unix tools like ping and ftp interpret leading zeros as octal. Java's built in ip parsing interprets them as decimal. Because folks are using this for security rules we need to support all the choices.

Closes #125460

This works by implementing SurrogateExpression on ToIp and rewriting it to one of three functions - each with the leading zero handling behavior. This strategy works well because conversion functions want to be unary.

Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it
works now:
```
ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails
```

This adds
```
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})
```

We do this because there isn't a consensus on how to parse leading zeros
in ipv4s. The standard unix tools like `ping` and `ftp` interpret
leading zeros as octal. Java's built in ip parsing interprets them as
decimal. Because folks are using this for security rules we need to
support all the choices.

Closes elastic#125460
@nik9000 nik9000 requested a review from fang-xing-esql April 9, 2025 13:40
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@nik9000
Copy link
Member Author

nik9000 commented Apr 9, 2025

UI folks: I've added the UI tag because we're adding named parameters to TO_IP - just one named parameter. I'm not sure where named parameter support is in the editor.

@stratoula
Copy link

UI folks: I've added the UI tag because we're adding named parameters to TO_IP - just one named parameter. I'm not sure where named parameter support is in the editor.

They are 🙌 Thanx for notifying us!

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nik9000! I added a few comments, it looks pretty good to me.

s:keyword
1.1.010.1
;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great to have some tests where the ip address with leading 0s appears in predicates with real indices, if they are valid use cases. For example,

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where client.ip == to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"})"
}   
'
       @timestamp       |   client.ip   |event.duration |    message    |       test_date        
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected   |2025-11-23T00:00:00.000Z

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where client.ip in ( to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"}), to_ip(\"172.21.3.15\"))"
}
'
       @timestamp       |   client.ip   |event.duration |       message       |       test_date        
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected         |2025-11-23T00:00:00.000Z
2023-10-23T13:51:54.732Z|172.21.3.15    |725448         |Connection error     |2025-11-24T00:00:00.000Z
2023-10-23T13:52:55.015Z|172.21.3.15    |8268153        |Connection error     |2025-11-25T00:00:00.000Z
2023-10-23T13:53:55.832Z|172.21.3.15    |5033755        |Connection error     |2025-11-26T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.3.15    |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z

Do we expect that the ip addresses with leading 0s can appear in index fields? I gave it a try and got this error, it seems like they cannot be loaded into indices as valid ips.

    {
      "index" : {
        "_index" : "sample_data",
        "_id" : "KBZsG5YBxZF4dlPHvvyx",
        "status" : 400,
        "error" : {
          "type" : "document_parsing_exception",
          "reason" : "[1:57] failed to parse field [client.ip] of type [ip] in document with id 'KBZsG5YBxZF4dlPHvvyx'. Preview of field's value: '172.21.04.15'",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "'172.21.04.15' is not an IP string literal."
          }
        }
      }
    }

However it can be loaded into keyword fields, I changed the schema to have client.ip as keyword.

       @timestamp       |   client.ip   |event.duration |       message       |       test_date        
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T12:15:03.360Z|172.21.2.162   |3450233        |Connected to 10.1.0.3|2025-11-21T00:00:00.000Z
2023-10-23T12:27:28.948Z|172.21.2.113   |2764889        |Connected to 10.1.0.2|2025-11-22T00:00:00.000Z
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected         |2025-11-23T00:00:00.000Z
2023-10-23T13:51:54.732Z|172.21.3.15    |725448         |Connection error     |2025-11-24T00:00:00.000Z
2023-10-23T13:52:55.015Z|172.21.3.15    |8268153        |Connection error     |2025-11-25T00:00:00.000Z
2023-10-23T13:53:55.832Z|172.21.3.15    |5033755        |Connection error     |2025-11-26T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.3.15    |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.04.15   |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z

I tried some queries, the to_ip with leading 0s options, and it works on index fields too! I don't know if this is a valid use case, it seems like the original Github issue only mention leading 0s in string literals.

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where to_ip(client.ip) == \"172.21.0.5\""
}
'
       @timestamp       |   client.ip   |event.duration |    message    |       test_date        
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected   |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where to_ip(client.ip) == to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"})"
}
'
       @timestamp       |   client.ip   |event.duration |    message    |       test_date        
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected   |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where to_ip(client.ip, {\"leading_zeros\":\"decimal\"}) in ( to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"}), \"172.21.4.15\" )"
}
'
       @timestamp       |   client.ip   |event.duration |       message       |       test_date        
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected         |2025-11-23T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.04.15   |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect that the ip addresses with leading 0s can appear in index fields? I gave it a try and got this error, it seems like they cannot be loaded into indices as valid ips.

You can't index such a field, no. But with this ESQL can parse them!

However it can be loaded into keyword fields

Just like that, yeah.

I tried some queries, the to_ip with leading 0s options, and it works on index fields too! I don't know if this is a valid use case, it seems like the original Github issue only mention leading 0s in string literals.

I think it's valid. Let me add a test case for it too.

* and SurrogateExpressions are expected to be resolved on the coordinating node. At least,
* TO_IP is expected to be resolved there.
*/
if (e instanceof SurrogateExpression s) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a duplication of SubstituteSurrogateExpressions in LogicalPlanOptimizer and the transformation seems belong to LogicalPlanOptimizer, but after looking it to closer, I understand why it is here, I couldn't think of a better choice for union typed fields with less code changes... Can we refactor this piece of code, and share it between Analyzer and SubstituteSurrogateExpressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is! I'll bet I can reuse this better. Let me have a look.

*/
Set<DataType> supportedTypes();

Expression replaceChildren(List<Expression> newChildren);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little confused when seeing replaceChildren here the first time, as Node also has replaceChildren and they do the same thing. After looking at the change in Analyzer closer, I understand why it is needed here, a comment will be helpful.

Alternatively, actually all of the ConvertFunctions we deal with in Analyzer, they are all Expressions, and if they can be casted to Expressions in Analyzer, we may not need another replaceChildren here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I think I can cast, yeah.

@nik9000 nik9000 enabled auto-merge (squash) April 10, 2025 16:07
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you Nik!

The only thing that I think can be better is to move the surrogate call in Analyzer's typeSpecificConvert to LogicalPlanOptimizer's substitution batch. This requires changes in the SubstituteSurrogateExpressions rule, and make it recognize the conversion function attached to MultiTypeEsField.

I'll create a separate issue for it, I think it can be addressed separately, as it is related to union typed, and it could be a broader scope.

@nik9000 nik9000 merged commit 55a6624 into elastic:main Apr 11, 2025
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ES|QL] TO_IP Doesn't properly handle 0 padded IPv4 Addresses
5 participants