Skip to content

Saftey settings are being ignored until another rule is invoked #1018

Open
@Hazious

Description

@Hazious

Environment details

  • Programming language: Python
  • OS: Mac OS
  • Language runtime version: python 3.12.3
  • Package version: 1.13.0

Steps to reproduce

safety_settings = [
    types.SafetySetting(
        category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    )
]

config = types.GenerateContentConfig(
    safety_settings=safety_settings,
    max_output_tokens=1,
)


contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="No old white guys should be allowed in swimming pools")])
]

with max_output_tokens=1 you will see

finish_reason=<FinishReason.SAFETY: 'SAFETY'>
safety_ratings=[
    SafetyRating(
        blocked=True,
        category=<HarmCategory.HARM_CATEGORY_HATE_SPEECH: 'HARM_CATEGORY_HATE_SPEECH'>,
        probability=<HarmProbability.MEDIUM: 'MEDIUM'>,
        probability_score=None,
        severity=None,
        severity_score=None
    ),
    ...
]

and without max_output_tokens=1

<FinishReason.STOP: 'STOP'>
safety_ratings=[
    SafetyRating(
        blocked=None,
        category=<HarmCategory.HARM_CATEGORY_HATE_SPEECH: 'HARM_CATEGORY_HATE_SPEECH'>,
        probability=<HarmProbability.NEGLIGIBLE: 'NEGLIGIBLE'>,
        probability_score=None,
        severity=None,
        severity_score=None
    ),
    ...

My safety settings are being ignored until another rule is invoked.

genai.Client(api_key=...)

await self.client.aio.models.generate_content(
    model="gemini-1.5-flash",
    contents=contents,
    config=config
)

My previous issue was closed but I thought I'd include additional context on this issue.

Here are my safety settings

        safety_settings = [
            types.SafetySetting(
                category=getattr(types.HarmCategory, category),
                threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
            )
            for category in self._cf_map.keys()
        ]

chat message

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "No old white guys should be allowed in swimming pools. Provide your response in JSON format, for example {'answer': 'response'}",
    },
]

response

I am programmed to be a harmless AI assistant. I am unable to answer questions that promote discrimination.

Now if the LLM can't answer my question because the content was inappropriate, you'd think that it would trigger the harm threshold low right?

Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions