-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: TO_IP can handle leading zeros #126532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it works now: ``` ROW ip = TO_IP("192.168.0.1") // OK! ROW ip = TO_IP("192.168.010.1") // Fails ``` This adds ``` ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"}) ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"}) ``` We do this because there isn't a consensus on how to parse leading zeros in ipv4s. The standard unix tools like `ping` and `ftp` interpret leading zeros as octal. Java's built in ip parsing interprets them as decimal. Because folks are using this for security rules we need to support all the choices. Closes elastic#125460
Hi @nik9000, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
UI folks: I've added the UI tag because we're adding named parameters to |
They are 🙌 Thanx for notifying us! |
…o esql_to_ip_leading_zeros_2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nik9000! I added a few comments, it looks pretty good to me.
s:keyword | ||
1.1.010.1 | ||
; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great to have some tests where the ip address with leading 0s appears in predicates with real indices, if they are valid use cases. For example,
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from sample_data | where client.ip == to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"})"
}
'
@timestamp | client.ip |event.duration | message | test_date
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5 |1232382 |Disconnected |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from sample_data | where client.ip in ( to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"}), to_ip(\"172.21.3.15\"))"
}
'
@timestamp | client.ip |event.duration | message | test_date
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5 |1232382 |Disconnected |2025-11-23T00:00:00.000Z
2023-10-23T13:51:54.732Z|172.21.3.15 |725448 |Connection error |2025-11-24T00:00:00.000Z
2023-10-23T13:52:55.015Z|172.21.3.15 |8268153 |Connection error |2025-11-25T00:00:00.000Z
2023-10-23T13:53:55.832Z|172.21.3.15 |5033755 |Connection error |2025-11-26T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.3.15 |1756467 |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z
Do we expect that the ip addresses with leading 0s can appear in index fields? I gave it a try and got this error, it seems like they cannot be loaded into indices as valid ips.
{
"index" : {
"_index" : "sample_data",
"_id" : "KBZsG5YBxZF4dlPHvvyx",
"status" : 400,
"error" : {
"type" : "document_parsing_exception",
"reason" : "[1:57] failed to parse field [client.ip] of type [ip] in document with id 'KBZsG5YBxZF4dlPHvvyx'. Preview of field's value: '172.21.04.15'",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "'172.21.04.15' is not an IP string literal."
}
}
}
}
However it can be loaded into keyword fields, I changed the schema to have client.ip
as keyword
.
@timestamp | client.ip |event.duration | message | test_date
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T12:15:03.360Z|172.21.2.162 |3450233 |Connected to 10.1.0.3|2025-11-21T00:00:00.000Z
2023-10-23T12:27:28.948Z|172.21.2.113 |2764889 |Connected to 10.1.0.2|2025-11-22T00:00:00.000Z
2023-10-23T13:33:34.937Z|172.21.0.5 |1232382 |Disconnected |2025-11-23T00:00:00.000Z
2023-10-23T13:51:54.732Z|172.21.3.15 |725448 |Connection error |2025-11-24T00:00:00.000Z
2023-10-23T13:52:55.015Z|172.21.3.15 |8268153 |Connection error |2025-11-25T00:00:00.000Z
2023-10-23T13:53:55.832Z|172.21.3.15 |5033755 |Connection error |2025-11-26T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.3.15 |1756467 |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.04.15 |1756467 |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z
I tried some queries, the to_ip
with leading 0s options, and it works on index fields too! I don't know if this is a valid use case, it seems like the original Github issue only mention leading 0s in string literals.
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from sample_data | where to_ip(client.ip) == \"172.21.0.5\""
}
'
@timestamp | client.ip |event.duration | message | test_date
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5 |1232382 |Disconnected |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from sample_data | where to_ip(client.ip) == to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"})"
}
'
@timestamp | client.ip |event.duration | message | test_date
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5 |1232382 |Disconnected |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from sample_data | where to_ip(client.ip, {\"leading_zeros\":\"decimal\"}) in ( to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"}), \"172.21.4.15\" )"
}
'
@timestamp | client.ip |event.duration | message | test_date
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5 |1232382 |Disconnected |2025-11-23T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.04.15 |1756467 |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that the ip addresses with leading 0s can appear in index fields? I gave it a try and got this error, it seems like they cannot be loaded into indices as valid ips.
You can't index such a field, no. But with this ESQL can parse them!
However it can be loaded into keyword fields
Just like that, yeah.
I tried some queries, the
to_ip
with leading 0s options, and it works on index fields too! I don't know if this is a valid use case, it seems like the original Github issue only mention leading 0s in string literals.
I think it's valid. Let me add a test case for it too.
* and SurrogateExpressions are expected to be resolved on the coordinating node. At least, | ||
* TO_IP is expected to be resolved there. | ||
*/ | ||
if (e instanceof SurrogateExpression s) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a duplication of SubstituteSurrogateExpressions
in LogicalPlanOptimizer
and the transformation seems belong to LogicalPlanOptimizer
, but after looking it to closer, I understand why it is here, I couldn't think of a better choice for union typed fields with less code changes... Can we refactor this piece of code, and share it between Analyzer
and SubstituteSurrogateExpressions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is! I'll bet I can reuse this better. Let me have a look.
*/ | ||
Set<DataType> supportedTypes(); | ||
|
||
Expression replaceChildren(List<Expression> newChildren); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little confused when seeing replaceChildren
here the first time, as Node also has replaceChildren
and they do the same thing. After looking at the change in Analyzer
closer, I understand why it is needed here, a comment will be helpful.
Alternatively, actually all of the ConvertFunction
s we deal with in Analyzer
, they are all Expression
s, and if they can be casted to Expression
s in Analyzer
, we may not need another replaceChildren
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I think I can cast, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you Nik!
The only thing that I think can be better is to move the surrogate call in Analyzer
's typeSpecificConvert
to LogicalPlanOptimizer
's substitution batch. This requires changes in the SubstituteSurrogateExpressions
rule, and make it recognize the conversion function attached to MultiTypeEsField
.
I'll create a separate issue for it, I think it can be addressed separately, as it is related to union typed, and it could be a broader scope.
Modifies TO_IP so it can handle leading
0
s in ipv4s. Here's how it works now:This adds
We do this because there isn't a consensus on how to parse leading zeros in ipv4s. The standard unix tools like
ping
andftp
interpret leading zeros as octal. Java's built in ip parsing interprets them as decimal. Because folks are using this for security rules we need to support all the choices.Closes #125460
This works by implementing
SurrogateExpression
onToIp
and rewriting it to one of three functions - each with the leading zero handling behavior. This strategy works well because conversion functions want to be unary.