-
Notifications
You must be signed in to change notification settings - Fork 25.3k
ES|QL: Fix BytesRef2BlockHash #130705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ES|QL: Fix BytesRef2BlockHash #130705
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @luigidellaquila, I've created a changelog YAML for you. |
} | ||
} | ||
|
||
public void test2BytesRefs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same test we had for BytesRef3BlockHash
@@ -1326,7 +1513,7 @@ public void close() { | |||
fail("hashes should not close AddInput"); | |||
} | |||
}); | |||
assertThat(output1.size(), equalTo(output1.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo probably
builder.append(BytesRefs.toString(block1.getBytesRef(i, scratch))); | ||
distinctKeys.add(builder.toString()); | ||
} | ||
assertThat(distinctKeys.size(), equalTo(totalPositions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert the fix and you'll see a failure here
@@ -145,7 +145,7 @@ public Block[] getKeys() { | |||
try { | |||
try (BytesRefBlock.Builder b1 = blockFactory.newBytesRefBlockBuilder(positions)) { | |||
for (int i = 0; i < positions; i++) { | |||
int k1 = (int) (finalHash.get(i) & 0xffffL); | |||
int k1 = (int) (finalHash.get(i) & 0xffffffffL); | |||
if (k1 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this (and below) be if (k <= 0)
{ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be 0
because we grab the low bits. Right? I haven't had enough coffee for bit math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the finalHash.get(i)
not return smth like 0xe0000000L
(or some other value with the 32nd bit set)?
The hash1.hash.get()
method seems to expect a non-negative argument. It might all be safe, it just stood out to me since we're truncating a long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are always positive, they are generated by BytesRefBlockHash, see hashOrdToGroupNullReserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one of the implicit assumptions is that a block hash will never have more than 2B positions. If it does, we'll need a larger data structure anyway I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are always positive
To make sure: 0xe0000000L
is a positive long. But (int) (0xe0000000L & 0xffffffffL)
is a negative int.
I guess one of the implicit assumptions is that a block hash will never have more than 2B positions.
Yeh, this is why I was thinking that this might still be safe. 👍 (At least if these are positions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @nik9000 should probably comment here as well - and whatever we end up with should probably become a Java comment here, that's too many assumptions for my head to infer, at least when skimming the file :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only a minor remark from me. Thanks @luigidellaquila !
It'd be good if @nik9000 could also give this a look as he's way more familiar with the block hashes than me.
@@ -1232,6 +1235,190 @@ public void testLongNull() { | |||
}, blockFactory.newLongArrayVector(values, values.length).asBlock(), blockFactory.newConstantNullBlock(values.length)); | |||
} | |||
|
|||
public void test2BytesRefsKeys() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe a more descriptive name? Like test2BytesRefsHighCardinalityKey
or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on HighCardinality
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is java. Lean into the memes about long method names....
} | ||
List<Output> output = new ArrayList<>(); | ||
|
||
try (BlockHash hash1 = new BytesRef2BlockHash(blockFactory, 0, 1, totalPositions);) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we randomize which key has high cardinality? It looks like this test only stuffs the high cardinality key into the first position, which repros the bug, but in principle the other position might have the same problem (if we implemented things differently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -145,7 +145,7 @@ public Block[] getKeys() { | |||
try { | |||
try (BytesRefBlock.Builder b1 = blockFactory.newBytesRefBlockBuilder(positions)) { | |||
for (int i = 0; i < positions; i++) { | |||
int k1 = (int) (finalHash.get(i) & 0xffffL); | |||
int k1 = (int) (finalHash.get(i) & 0xffffffffL); | |||
if (k1 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be 0
because we grab the low bits. Right? I haven't had enough coffee for bit math.
@@ -1232,6 +1235,190 @@ public void testLongNull() { | |||
}, blockFactory.newLongArrayVector(values, values.length).asBlock(), blockFactory.newConstantNullBlock(values.length)); | |||
} | |||
|
|||
public void test2BytesRefsKeys() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on HighCardinality
.
} | ||
List<Output> output = new ArrayList<>(); | ||
|
||
try (BlockHash hash1 = new BytesRef2BlockHash(blockFactory, 0, 1, totalPositions);) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -1232,6 +1235,190 @@ public void testLongNull() { | |||
}, blockFactory.newLongArrayVector(values, values.length).asBlock(), blockFactory.newConstantNullBlock(values.length)); | |||
} | |||
|
|||
public void test2BytesRefsKeys() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is java. Lean into the memes about long method names....
} | ||
} | ||
page = new Page(builder1.build(), builder2.build()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this all in one page? I know the BlockHash
needs it in memory. And we build that HashMap
with it in memory below. But maybe we make Page
s of a reasonable size here?
Meh. I guess it's only like 9mb. Probably not worth changing it unless you need to go bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's small enough, and it reproduced the problem pretty consistently, so I guess it's not worth the effort and complication.
Fix retrieval of keys in BytesRef2BlockHash
Fixes: #130644