-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Speed up TO_IP #126338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Speed up TO_IP #126338
Conversation
Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving elastic#125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @nik9000, I've created a changelog YAML for you. |
…nto benchmark_instructions_jdk24
💔 Backport failed
You can use sqren/backport to manually backport by running |
Note so we don't lose it: There's some talk about use SIMD to parse ipv4 addresses. They fit in a single SIMD register after all. The reading I did doesn't discuss leading zeros so that'd be fun! Anyway, I wanted something I could backport easily and our ways of directly accessing SIMD code didn't look super easy to backport at the moment, so I left it for later. |
Betrayal! |
Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving elastic#125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of elastic#124676 - but not the behavior change - just the code that allowed it.
Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving elastic#125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of elastic#124676 - but not the behavior change - just the code that allowed it.
Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving #125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of #124676 - but not the behavior change - just the code that allowed it.
Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving #125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of #124676 - but not the behavior change - just the code that allowed it.
Backports in. |
I wrote an `&&` when I meant and `||` in elastic#126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.
I wrote an `&&` when I meant and `||` in #126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.
I wrote an `&&` when I meant and `||` in elastic#126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.
I wrote an `&&` when I meant and `||` in elastic#126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.
Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did:
In a step towards solving #125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways.
This plugs the "rejects leading zeros" parser into
TO_IP
because that's the behavior it had before.Here is the performance:
So this is roughly 45% faster than what we had.