Skip to content

fix: Improve Validation Output, Declarative Config Consistency, and Docs #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2025

Conversation

flitzpiepe93
Copy link
Contributor

Improve Validation Output, Declarative Config Consistency, and Docs

✅ Summary

This PR introduces several meaningful improvements across the validation logic and documentation:


🧪 Validation Improvements

  • _dq_errors output refined
    The fail record output now contains only actual validation failures per row — removing padded null values for passed checks.

  • ValidationSummary now has a readable __str__()
    A clean, human-friendly output for quick diagnostics or CLI logs:

    ✅ Validation Summary (2025-05-11 14:30:01)
    Total records:   3,475,226
    Passed records:  3,252,514
    Failed records:    222,712
    Warnings:                 0
    Pass rate:           94.00%
    
  • Declarative config parsing now enforces kebab-case
    Keys like smaller-column and greater-column are now properly supported — consistent with all other check parameters.


📚 Documentation Enhancements

  • ✅ Fixed missing BatchDQEngine import in example
  • ✅ Renamed regex-match-check file for improved clarity
  • ✅ Added illustration to Integration Patterns section of Sphinx docs

🔗 Related Commits

  • fix(check-runner): _dq_errors contains only validation failures
  • fix(check-definition): enforce kebab-case for check definition
  • fix(validation-summary): add string representation of validation summary
  • docs(sphinx): add correct import to example
  • docs(sphinx): renamed filename of regex-match-check
  • docs(sphinx): add image to section Integration Patterns

Let me know if this should be split across changelog categories or released under a patch version.

@flitzpiepe93 flitzpiepe93 added bug Something isn't working documentation Improvements or additions to documentation chore Maintenance tasks like updates to docs, configs, workflows, or dependencies — no impact on core func labels May 12, 2025
Copy link

codecov bot commented May 12, 2025

Codecov Report

Attention: Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sparkdq/engine/batch/check_runner.py 90.90% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chore Maintenance tasks like updates to docs, configs, workflows, or dependencies — no impact on core func documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant