CNDB-15666: CNDB-15570: Fix handling mixed key types in SAI iterators #2077

michaelsembwever · 2025-10-20T12:00:38Z

https://github.com/riptano/cndb/issues/15666
https://github.com/riptano/cndb/issues/15776

Port into main-5.0 commit d7b8944

BLOCKED ON #2076

CNDB-15666: CNDB-15570: Fix handling mixed key types in SAI iterators

This commit fixes multiple issues with KeyRangeIterator implementations occasionally skipping or emitting duplicate keys when working on a mix of primary keys with empty / non-empty clusterings. This situation is possible while scanning tables with static columns or when some indexes are partition-aware (e.g. version AA) and others have been updated to a row-aware version (e.g. DC or EC). Due to those bugs, users could get incorrect results from SAI queries, e.g. results containing duplicated rows, duplicated partitions or even missing rows.

The commit introduces extensive randomized property-based tests for KeyRangeUnionIterator and KeyIntersectionIterator. Previously, the tests did not test for keys with mixed empty/non-empty clusterings.

Changes in KeyRangeUnionIterator:

KeyRangeUnionIterator merges streams of primary keys in such a way that duplicates are removed. Unfortunately it does not properly account for the fact that if a key with an empty clustering meets a key with a non-empty clustering and the same partition key, we must always return the key with an empty clustering. A key with an empty clustering will always fetch the rows matched by any specific row key for the same partition, but the reverse is not true.

The iterator implementation has been modified to always pick the key that matches more rows - a key with empty clustering wins over a key with non-empty clustering. Additionally, once a key with an empty clustering is emitted, no more keys in that partition are emitted.

Changes in KeyRangeIntersectionIterator:

Due to a very similar problem like in KeyRangeUnionIterator, KeyRangeIntersectionIterator could return either too few or too many keys, when keys with empty clusterings and keys with non-empty clusterings were present in the input key streams.

In particular consider 2 input streams A and B with the following keys:

A:
0: (1, Clustering.EMPTY)

B:
0: (1, 1)
1: (1, 2)

Key A.0 matches the whole partition 1. Therefore, the correct result of intersection are both keys of stream B. Unfortunately, the algorithm before this patch would advance both A and B iterators when emitting the first matching key. At the beginning of the second step, the iterator A would be already exhausted and no more keys would be produced. Finally key B.1 would be missing from the results.

This patch fixes it by introducing two changes to the intersection algorithm:

1. A key with non-empty clustering wins over a key with empty clustering and same partition.

2. The selected highest key is not consumed while searching for the highest matching key, but that happens only after the search loop finds a match. Then we have more information which iterators would be moved to the next item. Iterators positioned at a key with an empty clustering can be advanced only after we run out of keys with non-empty clustering in the same partition or if there are no other keys with non-empty clustering.

This patch also fixes another issue where we could return a less-specific key matching a full partition instead of a key matching one row:

A:
0: (1, Clustering.EMPTY)

B:
0: (1, 1)

In that case the iterator returned a key with empty clustering, which would result in fetching and postfiltering many unnecessary rows.

github-actions · 2025-10-20T12:00:56Z

driftx · 2025-10-20T16:17:14Z

Test failures look legit, something with the aa format?

michaelsembwever · 2025-10-27T07:03:39Z

Test failures look legit, something with the aa format?

Yes, the failures are only happening with Version.AA.

Extending the search results (cql LIMIT) I see

Invalid query results for query SELECT abbreviation FROM %s WHERE area_sq_miles NOT IN ? LIMIT ? expected:<[MS, PA, AR, CA, WY, AK, AL, DE, ID, MI, VA, TN, TX, KY]> but was:<[MS, PA, AR, CA, WY, AK, AL, DE, ID, MI, TN, TX, KY]>

So the row for VA is not being returned (where the clause value is NOT IN [43203.9, 7800.06, 82169.62]

"'VA',  9500000000,  true, '2018-06-19', 120.32,  39490.09,  4.6,  '152.130.96.221',  8367587,  383,  38,      'Virginia', '00:43:07', '2018-06-19T00:00:00', 17be691a-c1a4-4467-a4ad-64605c74fb1c, 1fc81a4c-d17d-11e8-a8d5-f2801f1b9fd1, 2"

I can't see (yet) how the iterators can be messing this up. While I investigate (there are changes to them from cassamdra-5.0)… @pkolaczk , maybe you have thoughts why this test that exists the same in main has broken in main-5.0 ?

michaelsembwever · 2025-11-03T12:00:55Z

i'm going to add in CNDB-15776 here…

djatnieks · 2025-11-04T18:21:36Z

Looks like there are still some AA test failures

pkolaczk · 2025-11-05T13:07:46Z

You need to include CNDB-14861 first. That should fix it.

michaelsembwever · 2025-11-05T13:46:48Z

mercy @pkolaczk .

here's that means #2076

i've rebased this branch off mck-cndb-15665-main-5.0 for now. it cannot be merged until that^ PR is merged. when it is i'll rebase this PR off main-5.0 again.

The PrimaryKeyWithSource class has been present for two years in the code base as an optimization for hybrid vector workloads, which have to materialize many primary keys in the search-then-sort query path. However, the logic is invalid for version aa (because we have the bug where compacted sstables write per row, not per partition) and it is also invalid for static columns. This commit avoids creation of PrimaryKeyWithSource in those cases.

CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes This commit fixes multiple issues with KeyRangeIterator implementations occasionally skipping or emitting duplicate keys when working on a mix of primary keys with empty / non-empty clusterings. This situation is possible while scanning tables with static columns or when some indexes are partition-aware (e.g. version AA) and others have been updated to a row-aware version (e.g. DC or EC). Due to those bugs, users could get incorrect results from SAI queries, e.g. results containing duplicated rows, duplicated partitions or even missing rows. The commit introduces extensive randomized property-based tests for KeyRangeUnionIterator and KeyIntersectionIterator. Previously, the tests did not test for keys with mixed empty/non-empty clusterings. Changes in KeyRangeUnionIterator: KeyRangeUnionIterator merges streams of primary keys in such a way that duplicates are removed. Unfortunately it does not properly account for the fact that if a key with an empty clustering meets a key with a non-empty clustering and the same partition key, we must always return the key with an empty clustering. A key with an empty clustering will always fetch the rows matched by any specific row key for the same partition, but the reverse is not true. The iterator implementation has been modified to always pick the key that matches more rows - a key with empty clustering wins over a key with non-empty clustering. Additionally, once a key with an empty clustering is emitted, no more keys in that partition are emitted. Changes in KeyRangeIntersectionIterator: Due to a very similar problem like in KeyRangeUnionIterator, KeyRangeIntersectionIterator could return either too few or too many keys, when keys with empty clusterings and keys with non-empty clusterings were present in the input key streams. In particular consider 2 input streams A and B with the following keys: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) 1: (1, 2) Key A.0 matches the whole partition 1. Therefore, the correct result of intersection are both keys of stream B. Unfortunately, the algorithm before this patch would advance both A and B iterators when emitting the first matching key. At the beginning of the second step, the iterator A would be already exhausted and no more keys would be produced. Finally key B.1 would be missing from the results. This patch fixes it by introducing two changes to the intersection algorithm: 1. A key with non-empty clustering wins over a key with empty clustering and same partition. 2. The selected highest key is not consumed while searching for the highest matching key, but that happens only after the search loop finds a match. Then we have more information which iterators would be moved to the next item. Iterators positioned at a key with an empty clustering can be advanced only after we run out of keys with non-empty clustering in the same partition or if there are no other keys with non-empty clustering. This patch also fixes another issue where we could return a less-specific key matching a full partition instead of a key matching one row: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) In that case the iterator returned a key with empty clustering, which would result in fetching and postfiltering many unnecessary rows. CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes (#2066) When row-aware and non-row-aware indexes are mixed, we now check the clustering index filter for all the keys that have clustering information, i.e. keys coming from the row-aware indexes. Earlier that check was accidentally disabled if at least one non-row-aware index was used by the query. That could cause retrieving rows that do not match the clustering condition of the query. Rebase notes: - includes CNDB-15683

michaelsembwever · 2025-11-05T15:56:36Z

@pkolaczk , new failure here also related to Version.AA in NumericIndexMixedVersionTest.testMultiVersionCompatibilityWithClusteringKeyFiltering
(if i change line 188 to any newer version that AA the test passes…)

pkolaczk · 2025-11-06T12:50:34Z

That failure is related to missing a fix for resetting SAI format versions in our test framework.
The fix was brought in as part of a larger feature allowing to set different versions in different keyspaces:

https://github.com/riptano/cndb/issues/15619

Looks like you already started porting it:
https://github.com/riptano/cndb/issues/15735

The critical code fragment to make the test work is the following snippet in SAIUtil.setCurrentVersion:

            // update the index contexts for each keyspace
            for (String keyspaceName : Schema.instance.getKeyspaces())
            {
                Keyspace keyspace = Keyspace.open(keyspaceName);
                for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
                {
                    SecondaryIndexManager sim = cfs.getIndexManager();
                    for (Index index : sim.listIndexes())
                    {
                        if (index instanceof StorageAttachedIndex)
                        {
                            StorageAttachedIndex sai = (StorageAttachedIndex)index;
                            IndexContext context = sai.getIndexContext();

                            Field field = IndexContext.class.getDeclaredField("primaryKeyFactory");
                            field.setAccessible(true);
                            field.set(context, version.onDiskFormat().newPrimaryKeyFactory(cfs.metadata().comparator));
                        }
                    }
                }
            }

You can just paste this snippet at the end of the try block and the test will work fine.

…write and to consider as current() depending on a keyspace (#2041) There is a new cassandra.sai.version.selector.class system property allowing to provide an implementation of the o.a.c.index.sai.disk.format.Version.Selector interface to specify that version of the SAI on-disk index format should be used for each keyspace.

michaelsembwever · 2025-11-10T16:54:47Z

That failure is related to missing a fix for resetting SAI format versions in our test framework.

Thanks!
Indeed, if I cherry-pick in CNDB-15735: CNDB-15619 (as i've done in the PR now, as a THROWAWAY commit) the test works.

I'm ok merging this PR without the THROWAWAY, knowing it's ready to be merged in soon after.

sonarqubecloud · 2025-11-10T17:51:28Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.2% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-11-10T17:54:32Z

❌ Build ds-cassandra-pr-gate/PR-2077 rejected by Butler

3 regressions found
See build details here

Found 3 new test failures

Test	Explanation	Runs	Upstream
o.a.c.cql3.validation.operations.AggregationQueriesTest.testAggregationQueryShouldNotTimeoutWhenItExceedesReadTimeout (compression)	REGRESSION	🔴🔴	2 / 16
o.a.c.index.sai.cql.NumericIndexMixedVersionTest.testMultiVersionCompatibilityWithClusteringKeyFiltering (compression)	NEW	🔵🔴	0 / 16
o.a.c.index.sai.cql.VectorKeyRestrictedOnPartitionTest.partitionRestrictedWidePartitionBqCompressedTest[ec] (compression)	REGRESSION	🔴🔵	0 / 16

Found 7 known test failures

michaelsembwever force-pushed the mck-cndb-15666-main-5.0 branch 2 times, most recently from 02916fe to 99c18fd Compare November 3, 2025 12:05

michaelsembwever force-pushed the mck-cndb-15666-main-5.0 branch from 99c18fd to ccf62dc Compare November 5, 2025 13:48

michaeljmarshall and others added 2 commits November 5, 2025 14:51

michaelsembwever force-pushed the mck-cndb-15666-main-5.0 branch from ccf62dc to e739a64 Compare November 5, 2025 13:52

CNDB-15666: CNDB-15570: Fix handling mixed key types in SAI iterators #2077

Are you sure you want to change the base?

CNDB-15666: CNDB-15570: Fix handling mixed key types in SAI iterators #2077

Uh oh!

Conversation

michaelsembwever commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 20, 2025

Checklist before you submit for review

Uh oh!

driftx commented Oct 20, 2025

Uh oh!

michaelsembwever commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelsembwever commented Nov 3, 2025

Uh oh!

djatnieks commented Nov 4, 2025

Uh oh!

pkolaczk commented Nov 5, 2025

Uh oh!

michaelsembwever commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelsembwever commented Nov 5, 2025

Uh oh!

pkolaczk commented Nov 6, 2025

Uh oh!

michaelsembwever commented Nov 10, 2025

Uh oh!

sonarqubecloud bot commented Nov 10, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Nov 10, 2025

❌ Build ds-cassandra-pr-gate/PR-2077 rejected by Butler

Found 3 new test failures

Found 7 known test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

michaelsembwever commented Oct 20, 2025 •

edited

Loading

michaelsembwever commented Oct 27, 2025 •

edited

Loading

michaelsembwever commented Nov 5, 2025 •

edited

Loading