Skip to content

qckfx/driftfx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

driftfx

🚦 Zero-false-positive drift detection for analytics pipelines.

Fast: Optimized with Cython - handles 10,000+ unique values in under 2 seconds.
Flexible: Works with both CSV and Parquet files.

⚠️ Early Stage Project: This project is in active development with frequent breaking changes. The API and data formats may change between versions. Please pin your dependencies if using in production.

Get Started

pip install driftfx

Python Usage

import pandas as pd, driftfx as dr

dr.snapshot(df_baseline, "baseline", cols=["name"])
result = dr.check(df_new, "baseline", cols=["name"])

if not result.is_clean():
    if result.renames:
        print("Renames / typos:", result.renames[:5])     # first 5 examples
    if result.brand_new:
        print("Brand-new names:", result.brand_new[:5])   # first 5 examples

CLI Usage

# snapshot baseline (CSV or Parquet)
$ driftfx snapshot --input data.csv --cols name --baseline baseline/
$ Snapshot complete# check new batch
$ driftfx check --input new.parquet --cols name --baseline baseline/
$ [✖] name: 17 renames / 31 new codes
$ Drift detected: 48 anomalies 

Performance

With Cython-optimized Levenshtein distance calculations:

Operation Time Throughput Dataset
Snapshot 1.6s 5,264 rows/s 10,000 unique values
Check 1.2s 7,893 rows/s 10,050 rows

The Cython implementation provides an 8x speedup for snapshot operations compared to pure Python.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •