Fix input/output message not redacted when guardrails_trace="enabled_full" #1072

leotac · 2025-10-22T21:19:42Z

Description

With guardrails_trace="enabled_full" and guardrails_trace="disabled", even if guardrail_redact_input or guardrail_redact_output are True, the input/output are not redacted.

See #1075 for details

This PR fixes the case with guardrails_trace="enabled_full".
The method _find_detected_and_blocked_policy failed to correctly identify the detected and blocked policies.

The issue is that with guardrails_trace="enabled_full", the response by Bedrock contains both triggered and non-triggered filters.

"trace": {
                "guardrail": {
                    "inputAssessment": {
                        "jrv9qlue4hag": {
                            "contentPolicy": {
                                "filters": [
                                    {
                                        "action": "NONE",
                                        "confidence": "NONE",
                                        "detected": False,
                                        "filterStrength": "HIGH",
                                        "type": "SEXUAL",
                                    },
                                    {
                                        "action": "BLOCKED",
                                        "confidence": "LOW",
                                        "detected": True,
                                        "filterStrength": "HIGH",
                                        "type": "VIOLENCE",
                                    },
...

The previous implementation of _find_detected_and_blocked_policy was bugged as it would not scan all dicts in a list, but would immediately return False after finding the first non-triggered filter.
This PR fixes it making sure to return False only if none of the filter is acutally triggered. The main fix is adding the any(), then it also simplifies a little the implementation.

Note that for the case with guardrails_trace="disabled", no metadata about the guardrails is received, so the current implementation cannot know if the input/output message should be redacted.
So it can't be easily fixed. Probably the use of guardrails_trace="disabled" should be disallowed in BedrockModel init, or at least the user should be warned against it.

Related Issues

#1075

Documentation PR

No doc change needed for this PR as it is, however if the parameter guardrails_trace stops begin exposed or "disabled" is not supported, it would probably need to be updated.

Type of Change

Bug fix

Testing

Ran unit tests & integration tests.

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Fix and simplify _find_detected_and_blocked_policy so that it correctly works even in case the guardrails assessments contains both detected and non-detected filters (as with guardrail_trace="enabled_full")

fix: detect guardrails with trace="enabled_full"

0f85014

Fix and simplify _find_detected_and_blocked_policy so that it correctly works even in case the guardrails assessments contains both detected and non-detected filters (as with guardrail_trace="enabled_full")

leotac requested a deployment to manual-approval October 22, 2025 21:20 — with GitHub Actions Waiting

leotac requested a deployment to manual-approval October 22, 2025 21:24 — with GitHub Actions Waiting

test: add bedrock int tests with different guardrail_trace levels

94a6d89

leotac force-pushed the fix/find-blocked-guardrail-with-full-trace branch from 9937a7f to 94a6d89 Compare October 22, 2025 21:29

leotac requested a deployment to manual-approval October 22, 2025 21:29 — with GitHub Actions Waiting

test: add xfail with guardrail_trace=disabled

47e135d

leotac requested a deployment to manual-approval October 23, 2025 07:22 — with GitHub Actions Waiting

leotac marked this pull request as ready for review October 23, 2025 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix input/output message not redacted when guardrails_trace="enabled_full" #1072

Fix input/output message not redacted when guardrails_trace="enabled_full" #1072

leotac commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix input/output message not redacted when guardrails_trace="enabled_full" #1072

Are you sure you want to change the base?

Fix input/output message not redacted when guardrails_trace="enabled_full" #1072

Conversation

leotac commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leotac commented Oct 22, 2025 •

edited

Loading