Skip to content

New Check: Column A < Column B Comparison (Numeric, Date, Timestamp) #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
flitzpiepe93 opened this issue May 4, 2025 · 1 comment
Closed
Labels
enhancement New feature or request feature request

Comments

@flitzpiepe93
Copy link
Contributor

New Check: Column A < Column B Comparison (Numeric, Date, Timestamp)

Description

We should introduce a new check that validates whether the values in one column are strictly less than the values in another column. This is a common requirement for:

  • Numeric comparisons (e.g. start_price < end_price)
  • Timestamps (e.g. pickup_time < dropoff_time)
  • Dates (e.g. order_date < shipping_date)

This check would enhance the expressiveness of the validation framework and support key business rules.

Example Usage

ColumnLessThanCheckConfig(
    check_id="pickup-before-dropoff",
    left_column="tpep_pickup_datetime",
    right_column="tpep_dropoff_datetime"
)

This configuration should raise an error for rows where tpep_pickup_datetime >= tpep_dropoff_datetime.

Expected Behavior

  • Should support columns with NumericType, DateType, or TimestampType
  • Should raise an error during config parsing if the column types are not compatible
  • Should handle null values gracefully (e.g. skip comparison if either column is null)
  • Should return failed rows with a clear reason

Proposed API

Config Class

class ColumnLessThanCheckConfig(BaseRowCheckConfig):
    check_id: str
    left_column: str
    right_column: str

    def to_check(self) -> ColumnLessThanCheck:
        return ColumnLessThanCheck(...)

Check Logic

class ColumnLessThanCheck(BaseRowCheck):
    ...
    def validate(self, df: DataFrame) -> DataFrame:
        # Add column indicating failure if left_column >= right_column

Benefits

  • Adds a commonly used comparison check to the framework
  • Supports key temporal and numeric validations
  • Enhances expressiveness for rule-based data quality constraints
@flitzpiepe93
Copy link
Contributor Author

This issue was resolved as part of Merge Request #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

No branches or pull requests

1 participant