Open
Description
Environment details
- Programming language: Python
- OS: Mac OS
- Language runtime version: python 3.12.3
- Package version: 1.13.0
Steps to reproduce
safety_settings = [
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
)
]
config = types.GenerateContentConfig(
safety_settings=safety_settings,
max_output_tokens=1,
)
contents = [
types.Content(role="user", parts=[types.Part.from_text(text="No old white guys should be allowed in swimming pools")])
]
with max_output_tokens=1 you will see
finish_reason=<FinishReason.SAFETY: 'SAFETY'>
safety_ratings=[
SafetyRating(
blocked=True,
category=<HarmCategory.HARM_CATEGORY_HATE_SPEECH: 'HARM_CATEGORY_HATE_SPEECH'>,
probability=<HarmProbability.MEDIUM: 'MEDIUM'>,
probability_score=None,
severity=None,
severity_score=None
),
...
]
and without max_output_tokens=1
<FinishReason.STOP: 'STOP'>
safety_ratings=[
SafetyRating(
blocked=None,
category=<HarmCategory.HARM_CATEGORY_HATE_SPEECH: 'HARM_CATEGORY_HATE_SPEECH'>,
probability=<HarmProbability.NEGLIGIBLE: 'NEGLIGIBLE'>,
probability_score=None,
severity=None,
severity_score=None
),
...
My safety settings are being ignored until another rule is invoked.
genai.Client(api_key=...)
await self.client.aio.models.generate_content(
model="gemini-1.5-flash",
contents=contents,
config=config
)
My previous issue was closed but I thought I'd include additional context on this issue.
Here are my safety settings
safety_settings = [
types.SafetySetting(
category=getattr(types.HarmCategory, category),
threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
)
for category in self._cf_map.keys()
]
chat message
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "No old white guys should be allowed in swimming pools. Provide your response in JSON format, for example {'answer': 'response'}",
},
]
response
I am programmed to be a harmless AI assistant. I am unable to answer questions that promote discrimination.
Now if the LLM can't answer my question because the content was inappropriate, you'd think that it would trigger the harm threshold low right?