Skip to content

fix: Improve Validation Output, Declarative Config Consistency, and Docs #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2025
Merged
Prev Previous commit
Next Next commit
fix(check-definition): enforce kebab-case for check definition
  • Loading branch information
flitzpiepe93 committed May 11, 2025
commit 0e63fb01b70778a96f5608c05c1309c43d7c94a6
2 changes: 1 addition & 1 deletion docs/source/custom_checks/implementation/aggregate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Minimal Example
@register_check_config(check_name="my-custom-count-check")
class RowCountMinCheckConfig(BaseAggregateCheckConfig):
check_class = RowCountMinCheck
min_count: int = Field(..., description="Minimum number of rows expected")
min_count: int = Field(..., description="Minimum number of rows expected", alias="min-count")

@model_validator(mode="after")
def validate_min(self) -> "RowCountMinCheckConfig":
Expand Down
4 changes: 2 additions & 2 deletions docs/source/custom_checks/plugin_architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ Workflow
.. code-block:: python

config = [
{"check": "null-check", "check_id": "c1", "column": "age"},
{"check": "positive-value", "check_id": "c2", "column": "salary"}
{"check": "null-check", "check-id": "c1", "column": "age"},
{"check": "positive-value", "check-id": "c2", "column": "salary"}
]

2. For each entry, it uses the ``check`` to look up the matching `CheckConfig` class via the central **registry**.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Declarative Configuration

- check: is-contained-in-check
check-id: validate_status_and_country
allowed_values:
allowed-values:
status:
- ACTIVE
- INACTIVE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Declarative Configuration

- check: is-not-contained-in-check
check-id: block_test_status_and_invalid_countries
forbidden_values:
forbidden-values:
status:
- TEST
- DUMMY
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ Declarative Configuration

- check: row-count-between-check
check-id: expected_daily_batch_size
min_count: 1000
max_count: 5000
min-count: 1000
max-count: 5000
severity: error

Typical Use Cases
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Declarative Configuration

- check: row-count-exact-check
check-id: validate_snapshot_size
expected_count: 500
expected-count: 500
severity: error

Typical Use Cases
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Declarative Configuration

- check: row-count-max-check
check-id: prevent_oversize_batch
max_count: 100000
max-count: 100000
severity: error


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Declarative Configuration

- check: row-count-min-check
check-id: minimum_required_records
min_count: 10000
min-count: 10000
severity: warning

Typical Use Cases
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Declarative Configuration
check-id: allowed_record_date_range
columns:
- record_date
min_value: "2020-01-01"
max_value: "2023-12-31"
min-value: "2020-01-01"
max-value: "2023-12-31"
inclusive: [true, true]
severity: critical

Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started/checks/date/date_max_check.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Declarative Configuration
check-id: maximum_allowed_record_date
columns:
- record_date
max_value: "2023-12-31"
max-value: "2023-12-31"
inclusive: true
severity: critical

Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started/checks/date/date_min_check.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Declarative Configuration
check-id: minimum_allowed_record_date
columns:
- record_date
min_value: "2020-01-01"
min-value: "2020-01-01"
inclusive: true
severity: critical

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Declarative Configuration
check-id: allowed_discount_range
columns:
- discount
min_value: 0.0
max_value: 100.0
min-value: 0.0
max-value: 100.0
inclusive: [true, true]
severity: critical

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Declarative Configuration
check-id: maximum_allowed_discount
columns:
- discount
max_value: 100.0
max-value: 100.0
inclusive: true
severity: critical

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Declarative Configuration
columns:
- price
- discount
min_value: 0.0
min-value: 0.0
inclusive: true
severity: critical

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Declarative Configuration

- check: column-presence-check
check-id: enforce_required_columns
required_columns:
required-columns:
- id
- event_timestamp
- status
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started/checks/schema/schema_check.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Declarative Configuration

- check: schema-check
check-id: enforce_schema_contract
expected_schema:
expected-schema:
id: int
name: string
amount: decimal(10,2)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Declarative Configuration
check-id: allowed_event_time_range
columns:
- event_time
min_value: "2020-01-01 00:00:00"
max_value: "2023-12-31 23:59:59"
min-value: "2020-01-01 00:00:00"
max-value: "2023-12-31 23:59:59"
inclusive: [true, true]
severity: critical

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Declarative Configuration
check-id: maximum_allowed_event_time
columns:
- event_time
max_value: "2023-12-31 23:59:59"
max-value: "2023-12-31 23:59:59"
inclusive: true
severity: critical

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Declarative Configuration
check-id: minimum_allowed_event_time
columns:
- event_time
min_value: "2020-01-01 00:00:00"
min-value: "2020-01-01 00:00:00"
inclusive: true
severity: critical

Expand Down
4 changes: 2 additions & 2 deletions docs/source/getting_started/defining_checks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,8 @@ definitions via dictionaries — for example loaded from YAML or JSON files.

- check: row-count-between-check
check-id: my-count-check
min_count: 100
max_count: 5000
min-count: 100
max-count: 5000

To load the configuration into SparkDQ, use the following code:

Expand Down
8 changes: 4 additions & 4 deletions sparkdq/checks/aggregate/count_checks/count_between_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ class RowCountBetweenCheckConfig(BaseAggregateCheckConfig):
"""

check_class = RowCountBetweenCheck
min_count: int = Field(..., description="Minimum number of rows expected")
max_count: int = Field(..., description="Maximum number of rows allowed")
min_count: int = Field(..., description="Minimum number of rows expected", alias="min-count")
max_count: int = Field(..., description="Maximum number of rows allowed", alias="max-count")

@model_validator(mode="after")
def validate_range(self) -> "RowCountBetweenCheckConfig":
Expand All @@ -101,10 +101,10 @@ def validate_range(self) -> "RowCountBetweenCheckConfig":
"""
if self.min_count < 0:
raise InvalidCheckConfigurationError(
f"min_count ({self.min_count}) must be greater than or equal to 0"
f"min-count ({self.min_count}) must be greater than or equal to 0"
)
if self.min_count > self.max_count:
raise InvalidCheckConfigurationError(
f"min_count ({self.min_count}) must not be greater than max_count ({self.max_count})"
f"min-count ({self.min_count}) must not be greater than max-count ({self.max_count})"
)
return self
4 changes: 2 additions & 2 deletions sparkdq/checks/aggregate/count_checks/count_exact_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class RowCountExactCheckConfig(BaseAggregateCheckConfig):
"""

check_class = RowCountExactCheck
expected_count: int = Field(..., description="Exact number of rows required")
expected_count: int = Field(..., description="Exact number of rows required", alias="expected-count")

@model_validator(mode="after")
def validate_expected(self) -> "RowCountExactCheckConfig":
Expand All @@ -86,6 +86,6 @@ def validate_expected(self) -> "RowCountExactCheckConfig":
"""
if self.expected_count < 0:
raise InvalidCheckConfigurationError(
f"expected_count ({self.expected_count}) must be zero or positive"
f"expected-count ({self.expected_count}) must be zero or positive"
)
return self
4 changes: 2 additions & 2 deletions sparkdq/checks/aggregate/count_checks/count_max_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class RowCountMaxCheckConfig(BaseAggregateCheckConfig):
"""

check_class = RowCountMaxCheck
max_count: int = Field(..., description="Maximum number of rows allowed")
max_count: int = Field(..., description="Maximum number of rows allowed", alias="max-count")

@model_validator(mode="after")
def validate_max(self) -> "RowCountMaxCheckConfig":
Expand All @@ -85,5 +85,5 @@ def validate_max(self) -> "RowCountMaxCheckConfig":
InvalidCheckConfigurationError: If ``max_count`` is not greater than 0.
"""
if self.max_count <= 0:
raise InvalidCheckConfigurationError(f"max_count ({self.max_count}) must be greater than 0")
raise InvalidCheckConfigurationError(f"max-count ({self.max_count}) must be greater than 0")
return self
4 changes: 2 additions & 2 deletions sparkdq/checks/aggregate/count_checks/count_min_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class RowCountMinCheckConfig(BaseAggregateCheckConfig):
"""

check_class = RowCountMinCheck
min_count: int = Field(..., description="Minimum number of rows expected")
min_count: int = Field(..., description="Minimum number of rows expected", alias="min-count")

@model_validator(mode="after")
def validate_min(self) -> "RowCountMinCheckConfig":
Expand All @@ -85,5 +85,5 @@ def validate_min(self) -> "RowCountMinCheckConfig":
InvalidCheckConfigurationError: If ``min_count`` is negative.
"""
if self.min_count <= 0:
raise InvalidCheckConfigurationError(f"min_count ({self.min_count}) must be greater than 0")
raise InvalidCheckConfigurationError(f"min-count ({self.min_count}) must be greater than 0")
return self
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,7 @@ class ColumnPresenceCheckConfig(BaseAggregateCheckConfig):
check_class = ColumnPresenceCheck

required_columns: list[str] = Field(
..., description="List of required column names that must be present in the DataFrame"
...,
description="List of required column names that must be present in the DataFrame",
alias="required-columns",
)
4 changes: 3 additions & 1 deletion sparkdq/checks/aggregate/schema_checks/schema_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,9 @@ class SchemaCheckConfig(BaseAggregateCheckConfig):
check_class = SchemaCheck

expected_schema: dict[str, str] = Field(
..., description="Expected schema mapping of column names to Spark data types"
...,
description="Expected schema mapping of column names to Spark data types",
alias="expected-schema",
)
strict: bool = Field(
default=True,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,12 @@ class ColumnLessThanCheckConfig(BaseRowCheckConfig):

check_class = ColumnLessThanCheck

smaller_column: str = Field(..., description="The column expected to contain smaller (or equal) values.")
greater_column: str = Field(..., description="The column expected to contain greater values.")
smaller_column: str = Field(
..., description="The column expected to contain smaller (or equal) values.", alias="smaller-column"
)
greater_column: str = Field(
..., description="The column expected to contain greater values.", alias="greater-column"
)
inclusive: bool = Field(
False, description="If True, allows equality (<=). Otherwise requires strict inequality (<)."
)
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ class IsContainedInCheckConfig(BaseRowCheckConfig):
check_class = IsContainedInCheck

allowed_values: dict[str, list[object]] = Field(
..., description="Mapping of column names to lists of allowed values."
..., description="Mapping of column names to lists of allowed values.", alias="allowed-values"
)

@model_validator(mode="after")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ class IsNotContainedInCheckConfig(BaseRowCheckConfig):
check_class = IsNotContainedInCheck

forbidden_values: dict[str, list[object]] = Field(
..., description="Mapping of column names to lists of forbidden values."
..., description="Mapping of column names to lists of forbidden values.", alias="forbidden-values"
)

@model_validator(mode="after")
Expand Down
4 changes: 2 additions & 2 deletions sparkdq/checks/row_level/date_checks/date_between_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,8 @@ class DateBetweenCheckConfig(BaseRowCheckConfig):
check_class = DateBetweenCheck

columns: List[str] = Field(..., description="Date columns to validate")
min_value: str = Field(..., description="Minimum allowed date in YYYY-MM-DD format")
max_value: str = Field(..., description="Maximum allowed date in YYYY-MM-DD format")
min_value: str = Field(..., description="Minimum allowed date in YYYY-MM-DD format", alias="min-value")
max_value: str = Field(..., description="Maximum allowed date in YYYY-MM-DD format", alias="max-value")
inclusive: tuple[bool, bool] = Field(
(False, False), description="Tuple of two booleans controlling boundary inclusivity"
)
Expand Down
5 changes: 4 additions & 1 deletion sparkdq/checks/row_level/date_checks/date_max_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,8 @@ class DateMaxCheckConfig(BaseRowCheckConfig):
check_class = DateMaxCheck

columns: List[str] = Field(..., description="Date columns to validate")
max_value: str = Field(..., description="Maximum allowed date in YYYY-MM-DD format")
max_value: str = Field(..., description="Maximum allowed date in YYYY-MM-DD format", alias="max-value")
inclusive: bool = Field(False, description="Whether the maximum date is inclusive")
model_config = {
"populate_by_name": True,
}
5 changes: 4 additions & 1 deletion sparkdq/checks/row_level/date_checks/date_min_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,8 @@ class DateMinCheckConfig(BaseRowCheckConfig):

check_class = DateMinCheck
columns: List[str] = Field(..., description="The list of date columns to check for minimum date")
min_value: str = Field(..., description="The minimum allowed date (inclusive) in 'YYYY-MM-DD' format")
min_value: str = Field(..., description="Minimum allowed date in YYYY-MM-DD format", alias="min-value")
inclusive: bool = Field(False, description="Whether the minimum date is inclusive")
model_config = {
"populate_by_name": True,
}
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ class NumericBetweenCheckConfig(BaseRowCheckConfig):
description="Whether min and max values are inclusive (min_inclusive, max_inclusive)",
)
min_value: float | int | Decimal = Field(
..., description="The minimum allowed value (inclusive) for the specified columns"
..., description="The minimum allowed value (inclusive) for the specified columns", alias="min-value"
)
max_value: float | int | Decimal = Field(
..., description="The maximum allowed value (inclusive) for the specified columns"
..., description="The maximum allowed value (inclusive) for the specified columns", alias="max-value"
)

@model_validator(mode="after")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,6 @@ class NumericMaxCheckConfig(BaseRowCheckConfig):
check_class = NumericMaxCheck
columns: List[str] = Field(..., description="The list of numeric columns to check for maximum value")
max_value: float | int | Decimal = Field(
..., description="The maximum allowed value (inclusive) for the specified columns"
..., description="The maximum allowed value (inclusive) for the specified columns", alias="max-value"
)
inclusive: bool = Field(False, description="Whether the maximum value is inclusive")
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,6 @@ class NumericMinCheckConfig(BaseRowCheckConfig):
check_class = NumericMinCheck
columns: List[str] = Field(..., description="The list of numeric columns to check for minimum value")
min_value: float | int | Decimal = Field(
..., description="The minimum allowed value (inclusive) for the specified columns"
..., description="The minimum allowed value (inclusive) for the specified columns", alias="min-value"
)
inclusive: bool = Field(False, description="Whether the minimum value is inclusive")
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,10 @@ class StringLengthBetweenCheckConfig(BaseRowCheckConfig):
check_class = StringLengthBetweenCheck

column: str = Field(..., description="The column to validate.")
min_length: int = Field(..., description="Minimum allowed string length (must be > 0).")
max_length: int = Field(..., description="Maximum allowed string length.")
min_length: int = Field(
..., description="Minimum allowed string length (must be > 0).", alias="min-length"
)
max_length: int = Field(..., description="Maximum allowed string length.", alias="max-length")
inclusive: tuple[bool, bool] = Field((True, True), description="Inclusiveness for (min, max) bounds.")

@model_validator(mode="after")
Expand Down
2 changes: 1 addition & 1 deletion sparkdq/checks/row_level/string_checks/max_length_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ class StringMaxLengthCheckConfig(BaseRowCheckConfig):
check_class = StringMaxLengthCheck

column: str = Field(..., description="The column to validate for maximum string length.")
max_length: int = Field(..., description="Maximum allowed length of the string values.")
max_length: int = Field(..., description="Maximum allowed string length.", alias="max-length")
inclusive: bool = Field(
True, description="If True, allows equality (<=). If False, requires strictly less (<)."
)
Expand Down
Loading