Skip to content

fix: ensure aggregate checks run on original input DataFrame #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

flitzpiepe93
Copy link
Contributor

@flitzpiepe93 flitzpiepe93 commented May 19, 2025

📍 Context
Previously, aggregate checks like schema-check failed unexpectedly due to internal columns (_dq_errors, _dq_passed) being added by row-level validations before schema validation was performed.

🛠 What’s fixed
This MR ensures that all aggregate checks are executed on the original, unmodified input DataFrame. This prevents false failures caused by internal metadata columns.

🧪 Test coverage
Includes an end-to-end test verifying that a valid schema passes when combined with other checks like NullCheck.

🙏 Thanks
Thanks to @tongqqiu for reporting this! Your feedback helped fix this issue quickly.

@flitzpiepe93 flitzpiepe93 self-assigned this May 19, 2025
@flitzpiepe93 flitzpiepe93 added the bug Something isn't working label May 19, 2025
Copy link

codecov bot commented May 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@flitzpiepe93 flitzpiepe93 merged commit 4dbd7a5 into main May 19, 2025
8 checks passed
@flitzpiepe93 flitzpiepe93 deleted the fix/input-of-aggregate-check branch May 19, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant