Skip to content

[ES|QL] TO_IP Doesn't properly handle 0 padded IPv4 Addresses #125460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BenB196 opened this issue Mar 22, 2025 · 6 comments · Fixed by #126532
Closed

[ES|QL] TO_IP Doesn't properly handle 0 padded IPv4 Addresses #125460

BenB196 opened this issue Mar 22, 2025 · 6 comments · Fixed by #126532
Assignees
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@BenB196
Copy link

BenB196 commented Mar 22, 2025

Elasticsearch Version

8.17.1

Installed Plugins

No response

Java Version

bundled

OS Version

Container

Problem Description

When attempting to convert a string to IP via TO_IP that has 0 padding (ex: 001.001.001.001) does not correctly process the IP. 0 padding is generally considered a valid way to represent IP addresses, and most system (ex: Windows), will convert 001.001.001.001 -> 1.1.1.1.

Steps to Reproduce

  1. Create an ES|QL function like:
ROW str = "001.001.001.001"
| EVAL ip = TO_IP(str)
  1. Observe the response doesn't contain the ip value:
{
  "rawResponse": {
    "is_running": false,
    "took": 1,
    "all_columns": [
      {
        "name": "str",
        "type": "keyword"
      },
      {
        "name": "ip",
        "type": "ip"
      }
    ],
    "columns": [
      {
        "name": "str",
        "type": "keyword"
      }
    ],
    "values": [
      [
        "001.001.001.001"
      ]
    ]
  }
}

Expect the response to contain the ip value of 1.1.1.1

This problem can kind of be worked around using REPLACE with regexp

ROW str = "010.10.100.000"
| EVAL ip = TO_IP(REPLACE(str, "(?<=^|\\.)0{0,2}", ""))

But it isn't the best user experience, and I'm sure isn't "free" from a query overhead perspective.

Logs (if relevant)

No response

@BenB196 BenB196 added >bug needs:triage Requires assignment of a team area label labels Mar 22, 2025
@iverase iverase added :Analytics/EQL EQL querying and removed needs:triage Requires assignment of a team area label labels Mar 24, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@iverase iverase added :Analytics/ES|QL AKA ESQL and removed :Analytics/EQL EQL querying labels Mar 24, 2025
@nik9000 nik9000 self-assigned this Mar 24, 2025
@nik9000
Copy link
Member

nik9000 commented Mar 24, 2025

I had to dig a bit and found that we explicitly ignore leading 0s. The original reasoning came from: https://github.com/google/guava/blob/master/guava/src/com/google/common/net/InetAddresses.java#L356-L357

    // Disallow leading zeroes, because no clear standard exists on
    // whether these should be interpreted as decimal or octal.

ESQL doesn't have to follow the same rules as elasticsearch uses, but that's why we reject these.

@nik9000
Copy link
Member

nik9000 commented Mar 24, 2025

You do get a warning, though it doesn't mention leading 0s:

< Warning: 299 Elasticsearch-9.1.0-81ef7ddd5352b05b53f0de499a46c7878cee1575 "Line 1:41: evaluation of [TO_IP(str)] failed, treating result as null. Only first 20 failures recorded."
< Warning: 299 Elasticsearch-9.1.0-81ef7ddd5352b05b53f0de499a46c7878cee1575 "Line 1:41: java.lang.IllegalArgumentException: '001.001.001.001' is not an IP string literal."

Looking around the internet it seems like most tools interpret it as octal.

@nik9000
Copy link
Member

nik9000 commented Mar 25, 2025

I've asked for a look from some members of Elastic's security solutions team. They'll have a much more educated opinion than mine. Maybe we want an option so you can pick how leading 0s work?

@BenB196
Copy link
Author

BenB196 commented Mar 25, 2025

Thanks @nik9000 yeah, based on your previous comment, I was going to suggest potentially adding an optional arg to the function to let the user decide if they're dealing with decimal or octal.

@nik9000
Copy link
Member

nik9000 commented Apr 1, 2025

Talked to @ninoslavmiskovic and he talked to his folks and told me:

  • It’s a rare but real issue.
  • No strong objections to supporting normalization, but:
  • It must be explicit to avoid incorrect assumptions.
  • It should not replace strict parsing by default.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 4, 2025
Speed up the TO_IP method by converting directly from utf-8 encoded
strings to the ip encoding. Previously we did:
```
utf-8 -> String -> INetAddress -> ip encoding
```

In a step towards solving elastic#125460 this creates three IP parsing
functions, one the rejects leading zeros, one that interprets leading
zeros as decimal numbers, and one the interprets leading zeros as octal
numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into `TO_IP` because
that's the behavior it had before.

Here is the performance:
```
Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op
```

So this is roughly 45% faster than what we had.
nik9000 added a commit that referenced this issue Apr 7, 2025
Speed up the TO_IP method by converting directly from utf-8 encoded
strings to the ip encoding. Previously we did:
```
utf-8 -> String -> INetAddress -> ip encoding
```

In a step towards solving #125460 this creates three IP parsing
functions, one the rejects leading zeros, one that interprets leading
zeros as decimal numbers, and one the interprets leading zeros as octal
numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into `TO_IP` because
that's the behavior it had before.

Here is the performance:
```
Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op
```

So this is roughly 45% faster than what we had.
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 7, 2025
Speed up the TO_IP method by converting directly from utf-8 encoded
strings to the ip encoding. Previously we did:
```
utf-8 -> String -> INetAddress -> ip encoding
```

In a step towards solving elastic#125460 this creates three IP parsing
functions, one the rejects leading zeros, one that interprets leading
zeros as decimal numbers, and one the interprets leading zeros as octal
numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into `TO_IP` because
that's the behavior it had before.

Here is the performance:
```
Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op
```

So this is roughly 45% faster than what we had.

This includes a big chunk of elastic#124676 - but not the behavior change -
just the code that allowed it.
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 7, 2025
Speed up the TO_IP method by converting directly from utf-8 encoded
strings to the ip encoding. Previously we did:
```
utf-8 -> String -> INetAddress -> ip encoding
```

In a step towards solving elastic#125460 this creates three IP parsing
functions, one the rejects leading zeros, one that interprets leading
zeros as decimal numbers, and one the interprets leading zeros as octal
numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into `TO_IP` because
that's the behavior it had before.

Here is the performance:
```
Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op
```

So this is roughly 45% faster than what we had.

This includes a big chunk of elastic#124676 - but not the behavior change -
just the code that allowed it.
elasticsearchmachine pushed a commit that referenced this issue Apr 7, 2025
Speed up the TO_IP method by converting directly from utf-8 encoded
strings to the ip encoding. Previously we did:
```
utf-8 -> String -> INetAddress -> ip encoding
```

In a step towards solving #125460 this creates three IP parsing
functions, one the rejects leading zeros, one that interprets leading
zeros as decimal numbers, and one the interprets leading zeros as octal
numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into `TO_IP` because
that's the behavior it had before.

Here is the performance:
```
Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op
```

So this is roughly 45% faster than what we had.

This includes a big chunk of #124676 - but not the behavior change -
just the code that allowed it.
elasticsearchmachine pushed a commit that referenced this issue Apr 7, 2025
Speed up the TO_IP method by converting directly from utf-8 encoded
strings to the ip encoding. Previously we did:
```
utf-8 -> String -> INetAddress -> ip encoding
```

In a step towards solving #125460 this creates three IP parsing
functions, one the rejects leading zeros, one that interprets leading
zeros as decimal numbers, and one the interprets leading zeros as octal
numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into `TO_IP` because
that's the behavior it had before.

Here is the performance:
```
Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op
```

So this is roughly 45% faster than what we had.

This includes a big chunk of #124676 - but not the behavior change -
just the code that allowed it.
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 9, 2025
Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it
works now:
```
ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails
```

This adds
```
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})
```

We do this because there isn't a consensus on how to parse leading zeros
in ipv4s. The standard unix tools like `ping` and `ftp` interpret
leading zeros as octal. Java's built in ip parsing interprets them as
decimal. Because folks are using this for security rules we need to
support all the choices.

Closes elastic#125460
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 9, 2025
Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it
works now:
```
ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails
```

This adds
```
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})
```

We do this because there isn't a consensus on how to parse leading zeros
in ipv4s. The standard unix tools like `ping` and `ftp` interpret
leading zeros as octal. Java's built in ip parsing interprets them as
decimal. Because folks are using this for security rules we need to
support all the choices.

Closes elastic#125460
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants