-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Is there an existing issue for this?
- I have searched the existing issues
Problem statement
Ability to skip checks as introduced with #608 and not stop check execution if check is not valid because of missing columns is great. I love this idea. It allows a lot of simplification in our check process, because removes the need to pre-select which checks are applicable for a data frame.
But if a check is skipped, then all rows of the data frame are added to the invalid dataframe and a log entry is added to eather _errors resp _warnings column. This is not a useful behaviour.
That a check was skipped should not be seen as invalidation of a row. Rows should not be added to invalid df just because of skipped checks.
And btw. these log entries are also not nicely identifyable in the array in _errors resp _warnings column. You need to text parse the message attribute, if the text starts with "Check evaluation skipped due to". There should be a structured way to identify these log entries
Proposed Solution
Prio 1: Provide a configuration to skip quietly, no entry added to _errors resp _warnings column for skipped checks then and only rows with true issues would be included in invalid df. e.g.
valid_df, invalid_df = dq_engine.apply_checks_by_metadata_and_split(test_df, checks, ref_dfs=ref_dfs, skip_quietly=true)
Prio 2 (in addition or instead of Prio 1): Introduce a new output data frame, which just contains the list of skipped checks similar to the current log messages at command line
08:49:37 WARN [d.l.dqx.manager] Skipping check 'dmo_ae_faae_fraction' due to invalid check filter: 'AECATTT = 'FRACTION''
valid_df, invalid_df, skipped_checks_df = dq_engine.apply_checks_by_metadata_and_split(test_df, checks, ref_dfs=ref_dfs)
Prio 3: Add a new attribute skipped = true to log entry in _errors resp _warnings to enable clear identification of these log entries avoid need to look for message attribute starting with "Check evaluation skipped due to" to identify skipped (should be done anyhow)
{
"name": "dmo_ae_faae_fraction",
"skipped": true,
"message": "Check evaluation skipped due to invalid check filter: 'AECATTT = 'FRACTION''",
"columns": ["SUBJECTNAME", "AEGRPID", "AECAT"],
"filter": "AECATTT = 'FRACTION'",
"function": "foreign_key",
"run_time": "2025-11-28T08:49:37.886Z",
"run_id": "c033c831-9b1e-44c3-9562-063bc0dac94c",
"user_metadata": {}
}
Additional Context
No response