Skip to content

feat: add data freshness check #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 21, 2025
Merged

Conversation

flitzpiepe93
Copy link
Contributor

@flitzpiepe93 flitzpiepe93 commented May 21, 2025

🚀 New Feature: Freshness Check

This PR introduces a new aggregate-level data quality check that validates whether a dataset has been updated recently enough, based on a configurable time threshold.

✅ What’s Included

  • FreshnessCheck implementation (aggregate check)
  • Flexible configuration of time threshold via:
    • interval: positive integer
    • period: one of "year", "month", "week", "day", "hour", "minute", "second"
  • Declarative and programmatic config support via FreshnessCheckConfig
  • Unit tests covering:
    • Valid & outdated timestamps
    • Empty / null-only columns
    • Literal enforcement and config validation
  • Sphinx documentation with YAML and Python examples

🔍 Example Usage

FreshnessCheckConfig(
    check_id="updated-recently",
    column="last_updated",
    interval=2,
    period="day"
)

💡 Why This Matters

This check is inspired by the dbt freshness concept and helps detect:

  • Stale or delayed data ingestion
  • Broken or lagging ETL jobs
  • Unexpected pipeline pauses

It ensures that datasets meet minimum freshness requirements, allowing pipelines to fail fast if data is outdated.


Let me know if you'd like me to squash the commits or adjust the scope! ✅

@flitzpiepe93 flitzpiepe93 self-assigned this May 21, 2025
@flitzpiepe93 flitzpiepe93 changed the title Feat/add data freshness check feat: add data freshness check May 21, 2025
@flitzpiepe93 flitzpiepe93 mentioned this pull request May 21, 2025
Copy link

codecov bot commented May 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@tongqqiu
Copy link
Contributor

LGTM

@flitzpiepe93 flitzpiepe93 merged commit f0d2318 into main May 21, 2025
8 checks passed
@flitzpiepe93 flitzpiepe93 deleted the feat/add-data-freshness-check branch May 21, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants