Skip to content

Fix uniquify to handle multiple successive duplicates #126889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 16, 2025

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Apr 16, 2025

CollectionUtils.uniquify is based on C++ std::unique. This commit fixes the implementation to correctly match std::unique, so that successive ranges of duplicates are handled.

closes #126883

CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes elastic#126883
@rjernst rjernst added >bug :Core/Infra/Core Core issues without another label auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v9.1.0 v8.17.6 labels Apr 16, 2025
@rjernst rjernst requested a review from lkts April 16, 2025 03:56
@rjernst rjernst requested a review from a team as a code owner April 16, 2025 03:56
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Apr 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @rjernst, I've created a changelog YAML for you.

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -78,6 +78,7 @@ public void testUniquify() {
assertUniquify(List.of(1, 1, 1), Comparator.naturalOrder(), 1);
assertUniquify(List.of(1, 2, 2, 3), Comparator.naturalOrder(), 3);
assertUniquify(List.of(1, 2, 2, 2), Comparator.naturalOrder(), 2);
assertUniquify(List.of(1, 2, 2, 3, 3, 5), Comparator.naturalOrder(), 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a stronger test. I was thinking about this:

  1. Generate random int s for the number of unique items in list
  2. Generate s unique numbers
  3. For each random generate random number of repeats (1 to 10)
  4. Put everything in the list
  5. uniquify the list
  6. compare resulting list with a list of only unique numbers that we have from 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a random test, see 62cdd9a

@rjernst rjernst changed the title Rework uniquify to not use iterators Fix uniquify to handle multiple successive duplicates Apr 16, 2025
@rjernst
Copy link
Member Author

rjernst commented Apr 16, 2025

@ldematte I reworked this back to using iterators. I remembered that the reason to use iterators is to support linked lists, since doing LinkedList.set(index, value) is linear.

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@rjernst rjernst enabled auto-merge (squash) April 16, 2025 18:44
@rjernst rjernst merged commit a813949 into elastic:main Apr 16, 2025
16 of 17 checks passed
rjernst added a commit to rjernst/elasticsearch that referenced this pull request Apr 16, 2025
CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes elastic#126883
rjernst added a commit to rjernst/elasticsearch that referenced this pull request Apr 16, 2025
CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes elastic#126883
rjernst added a commit to rjernst/elasticsearch that referenced this pull request Apr 16, 2025
CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes elastic#126883
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.x
9.0
8.17

rjernst added a commit to rjernst/elasticsearch that referenced this pull request Apr 16, 2025
CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes elastic#126883
elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
)

CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes #126883
elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
)

CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes #126883
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
)

CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes #126883

Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
)

CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes #126883

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v8.17.6 v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CollectionUtils.uniquify is incorrect in some cases
4 participants