Skip to content

GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, and arrow::ChuckedArray #46779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2025

Conversation

andishgar
Copy link
Contributor

@andishgar andishgar commented Jun 11, 2025

Rationale for this change

Enable EqualOptions::use_atol for arrow::Array and arrow::Scalar.

What changes are included in this PR?

  • Changed the default value of EqualOptions::use_atol to false.
  • Enabled EqualOptions::use_atol_ for the following methods.
    • arrow::ScalarEquals
    • arrow::ArrayRangeEquals
    • arrow:: ArrayEquals

Are these changes tested?

Yes, I ran the relevant unit tests.

Are there any user-facing changes?

The default value of EqualOptions::use_atol is changed to false

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

@andishgar andishgar marked this pull request as draft June 11, 2025 11:52
@andishgar andishgar marked this pull request as ready for review June 11, 2025 18:47
@andishgar
Copy link
Contributor Author

In this pull request, I initially set the default value of arrow::EqualOptions::use_atol to true. However, during the course of this work, I realized that the correct default should be false, to align with the default behavior of arrow::ScalarEquals, arrow::ArrayRangeEquals, and arrow::ArrayEquals.

@andishgar andishgar marked this pull request as draft June 16, 2025 13:37
@andishgar andishgar changed the title GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array and arrow::Scalar GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, arrow::ChuckedArray Jun 16, 2025
@andishgar andishgar changed the title GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, arrow::ChuckedArray GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, and arrow::ChuckedArray Jun 16, 2025
@andishgar
Copy link
Contributor Author

@kou I have a question regarding arrow::Table equality. The method arrow::RecordBatch::ApproxEquals does not consider schema or metadata in the comparison. Shouldn't arrow::Table have a similar method ?

@andishgar andishgar marked this pull request as ready for review June 16, 2025 15:16
@kou
Copy link
Member

kou commented Jun 17, 2025

I'm OK with it but I like arrow::Table::Equals(const Table& other, const EqualOptions& opts = EqualOptions::Defaults()) not arrow::Table::ApproxEquals(). We need to add more options to EqualOptions for it.

@andishgar
Copy link
Contributor Author

I'm OK with it but I like arrow::Table::Equals(const Table& other, const EqualOptions& opts = EqualOptions::Defaults()) not arrow::Table::ApproxEquals(). We need to add more options to EqualOptions for it.

@kou, thank you for your response. I'll open an issue to address this.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the default behavior of the tolerance-based comparison flag (use_atol) to be opt-in, and propagates that flag through Array, Scalar, RecordBatch, and ChunkedArray equality routines, with updated documentation and tests.

  • Change EqualOptions::use_atol_ default to false and wire it into ArrayEquals, ArrayRangeEquals, ScalarEquals, RecordBatch::Equals, and ChunkedArray::Equals.
  • Update and add unit tests for floating‐point comparisons to exercise the new use_atol flag in scalar, array, record batch, and statistics tests.
  • Revise documentation comments to explain the interplay between exact vs. approximate equality and the use_atol option.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cpp/src/arrow/scalar_test.cc Added TestUseAtol to validate ScalarEquals with and without use_atol
cpp/src/arrow/record_batch_test.cc Updated record‐batch tests to assert Equals with use_atol(true)
cpp/src/arrow/record_batch.h Documented use_atol behavior for RecordBatch::Equals and ApproxEquals
cpp/src/arrow/compare.h Changed default use_atol_ to false and added flags/docs for arrays and scalars
cpp/src/arrow/compare.cc Propagate options.use_atol() into ArrayEquals, ArrayRangeEquals, and ScalarEquals
cpp/src/arrow/chunked_array.h Documented use_atol behavior on ChunkedArray::Equals and ApproxEquals
cpp/src/arrow/array/statistics_test.cc Adjusted statistics tests to use use_atol(true) for tolerance-based equality
cpp/src/arrow/array/array_test.cc Added CheckFloatApproxEqualsWithAtol and test case for floating arrays
Comments suppressed due to low confidence (3)

cpp/src/arrow/record_batch.h:124

  • Split this into two sentences or add a period after ApproxEquals. to separate the clause, e.g., ... ApproxEquals. Schema comparison is included ... for clarity and correct grammar.
  /// using \ref arrow::RecordBatch::ApproxEquals Although, Schema comparison is included,

cpp/src/arrow/chunked_array.h:164

  • Add unit tests for ChunkedArray::Equals and ChunkedArray::ApproxEquals to verify that the new use_atol flag correctly toggles exact vs. approximate comparisons on chunked arrays.
  /// Setting \ref arrow::EqualOptions::use_atol to true is equivalent to

cpp/src/arrow/compare.h:107

  • [nitpick] The use_atol behavior is documented separately for each overload; consider centralizing or referencing a single description to reduce duplicated comment blocks and ease future maintenance.
/// Returns true if the arrays are exactly equal

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jun 17, 2025
@andishgar
Copy link
Contributor Author

@kou Thank you for the review. I’ll look into it

@andishgar
Copy link
Contributor Author

@kou Thank you for your message. I’ll start applying your suggestions from Saturday. By the way, I deleted my last message since it wasn’t relevant to the discussion. If possible, please consider deleting yours as well.

@kou
Copy link
Member

kou commented Jun 26, 2025

OK. I've deleted it.

@andishgar
Copy link
Contributor Author

@kou I think the following test is broken:

TEST_F(TestChunkedArray, EqualsSameAddressWithNaNs) {
auto chunk_with_nan1 = ArrayFromJSON(float64(), "[0, 1, 2, NaN]");
auto chunk_without_nan1 = ArrayFromJSON(float64(), "[3, 4, 5]");
ArrayVector chunks1 = {chunk_with_nan1, chunk_without_nan1};
ASSERT_OK_AND_ASSIGN(auto chunked_array_with_nan1, ChunkedArray::Make(chunks1));
ASSERT_FALSE(chunked_array_with_nan1->Equals(chunked_array_with_nan1));
auto chunk_without_nan2 = ArrayFromJSON(float64(), "[6, 7, 8, 9]");
ArrayVector chunks2 = {chunk_without_nan1, chunk_without_nan2};
ASSERT_OK_AND_ASSIGN(auto chunked_array_without_nan1, ChunkedArray::Make(chunks2));
ASSERT_TRUE(chunked_array_without_nan1->Equals(chunked_array_without_nan1));
auto int32_array = ArrayFromJSON(int32(), "[0, 1, 2]");
auto float64_array_with_nan = ArrayFromJSON(float64(), "[0, 1, NaN]");
ArrayVector arrays1 = {int32_array, float64_array_with_nan};
std::vector<std::string> fieldnames = {"Int32Type", "Float64Type"};
ASSERT_OK_AND_ASSIGN(auto struct_with_nan, StructArray::Make(arrays1, fieldnames));
ArrayVector chunks3 = {struct_with_nan};
ASSERT_OK_AND_ASSIGN(auto chunked_array_with_nan2, ChunkedArray::Make(chunks3));
ASSERT_FALSE(chunked_array_with_nan2->Equals(chunked_array_with_nan2));
auto float64_array_without_nan = ArrayFromJSON(float64(), "[0, 1, 2]");
ArrayVector arrays2 = {int32_array, float64_array_without_nan};
ASSERT_OK_AND_ASSIGN(auto struct_without_nan, StructArray::Make(arrays2, fieldnames));
ArrayVector chunks4 = {struct_without_nan};
ASSERT_OK_AND_ASSIGN(auto chunked_array_without_nan2, ChunkedArray::Make(chunks4));
ASSERT_TRUE(chunked_array_without_nan2->Equals(chunked_array_without_nan2));
}

If the intention is to test this condition, the expression will always evaluate to false. This is because new arrow::ArrayData instances are created during chunk comparisons, as shown here:

*next_left = chunk_left->Slice(chunk_pos_left_, iteration_size);
*next_right = chunk_right->Slice(chunk_pos_right_, iteration_size);

1-Is my understanding correct?

2- Should I open a separate issue and add a TODO above the test?

@kou
Copy link
Member

kou commented Jun 28, 2025

1-Is my understanding correct?

Hmm. I think no. I think that it's a test for &chunked_array1 == &chunked_array2 not &array_data1 == &array_data2.

@andishgar
Copy link
Contributor Author

Hmm. I think no. I think that it's a test for &chunked_array1 == &chunked_array2 not &array_data1 == &array_data2.

Oops, sorry—I just realized there's another overload here that checks whether two arrow::ChunkedArray instances share the same memory. I mistakenly only looked at this version.

@andishgar
Copy link
Contributor Author

I'm OK with it but I like arrow::Table::Equals(const Table& other, const EqualOptions& opts = EqualOptions::Defaults()) not arrow::Table::ApproxEquals(). We need to add more options to EqualOptions for it.

Currently, arrow::Table::Equals does not accept arrow::EqualOptions , so no options can be used for floating-point comparisons. Should I open an issue for this?

@andishgar andishgar requested a review from kou June 29, 2025 14:19
@kou
Copy link
Member

kou commented Jun 30, 2025

Yes. Let's work on it as a separated task.

@kou kou merged commit f8cd17c into apache:main Jun 30, 2025
37 checks passed
@kou kou removed the awaiting merge Awaiting merge label Jun 30, 2025
@andishgar andishgar deleted the enable-use_atol_ branch June 30, 2025 03:14
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit f8cd17c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants