Skip to content

Conversation

leotac
Copy link

@leotac leotac commented Oct 22, 2025

Description

With guardrails_trace="enabled_full" and guardrails_trace="disabled", even if guardrail_redact_input or guardrail_redact_output are True, the input/output are not redacted.

See #1075 for details

This PR fixes the case with guardrails_trace="enabled_full".
The method _find_detected_and_blocked_policy failed to correctly identify the detected and blocked policies.

The issue is that with guardrails_trace="enabled_full", the response by Bedrock contains both triggered and non-triggered filters.

"trace": {
                "guardrail": {
                    "inputAssessment": {
                        "jrv9qlue4hag": {
                            "contentPolicy": {
                                "filters": [
                                    {
                                        "action": "NONE",
                                        "confidence": "NONE",
                                        "detected": False,
                                        "filterStrength": "HIGH",
                                        "type": "SEXUAL",
                                    },
                                    {
                                        "action": "BLOCKED",
                                        "confidence": "LOW",
                                        "detected": True,
                                        "filterStrength": "HIGH",
                                        "type": "VIOLENCE",
                                    },
...

The previous implementation of _find_detected_and_blocked_policy was bugged as it would not scan all dicts in a list, but would immediately return False after finding the first non-triggered filter.
This PR fixes it making sure to return False only if none of the filter is acutally triggered. The main fix is adding the any(), then it also simplifies a little the implementation.

Note that for the case with guardrails_trace="disabled", no metadata about the guardrails is received, so the current implementation cannot know if the input/output message should be redacted.
So it can't be easily fixed. Probably the use of guardrails_trace="disabled" should be disallowed in BedrockModel init, or at least the user should be warned against it.

Related Issues

#1075

Documentation PR

No doc change needed for this PR as it is, however if the parameter guardrails_trace stops begin exposed or "disabled" is not supported, it would probably need to be updated.

Type of Change

Bug fix

Testing

Ran unit tests & integration tests.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Fix and simplify _find_detected_and_blocked_policy so that it
correctly works even in case the guardrails assessments contains
both detected and non-detected filters
(as with guardrail_trace="enabled_full")
@leotac leotac force-pushed the fix/find-blocked-guardrail-with-full-trace branch from 9937a7f to 94a6d89 Compare October 22, 2025 21:29
@leotac leotac marked this pull request as ready for review October 23, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant