Skip to content

Invalid JSON error in clarify question endpoints due to Markdown formatting #479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Chapin2018 opened this issue Mar 28, 2025 · 6 comments
Closed

Comments

@Chapin2018
Copy link

When using the default model, clarification requests frequently fail with the following error:

ERROR: Error getting clarifications: 1 validation error for ClarificationOutput
Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n  "clarifica... []\n    }\n  ]\n}\n```'

Root Cause:
The error occurs because the response contains a JSON string formatted within a Markdown code block , which is not correctly parsed before validation against ClarificationOutput.

Affected Endpoints:

  • POST /oracle/clarify_question
  • POST /query-data/clarify
@rishsriv
Copy link
Member

Hi there, thanks for reporting! Could you confirm that this is while using the default model (gpt-4o) with the base OpenAI API URL?

I can't seem to replicate this, and gpt-4o generally has excellent adherence to response formats. I suspect that there might be an issue with the underlying models/APIs being used.

@rishsriv
Copy link
Member

We have also added exception handling in 2cc03cb, which should help with this. Would love to get more detail on this so we can understand the issue and implement a more robust solution!

@Chapin2018
Copy link
Author

Chapin2018 commented Mar 29, 2025

Hi there,hope you're doing well!
I'm currently using OpenAI compatible API(might be a bit less robust).I've also slightly adjusted the flow to call chat_openai_async directly (bypassing the chat_async wrapper for now while testing this).
And the error still happen.

# Paste the full traceback here
File "/backend/query_data_routes.py", line 627, in get_question_type_route
    res = await classify_question_type(question)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/backend/utils_clarification.py", line 136, in classify_question_type
    response = await chat_openai_async(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/backend/defog/llm/utils.py", line 1161, in chat_openai_async
    response = await client_openai.beta.chat.completions.parse(**request_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/beta/chat/completions.py", line 437, in parse
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1564, in _request
    return await self._process_response(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1661, in _process_response
    return await api_response.parse()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_response.py", line 432, in parse
    parsed = self._options.post_parser(parsed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/beta/chat/completions.py", line 431, in parser
    return _parse_chat_completion(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py", line 110, in parse_chat_completion
    "parsed": maybe_parse_content(
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py", line 161, in maybe_parse_content
    return _parse_content(response_format, message.content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py", line 221, in _parse_content
    return cast(ResponseFormatT, model_parse_json(response_format, content))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_compat.py", line 169, in model_parse_json
    return model.model_validate_json(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 656, in model_validate_json
    return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for QuestionType
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value="The user's question seem... for data directly.\n\n", input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
ERROR: Error getting clarifications: 1 validation error for ClarificationOutput
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n  "clarifica... []\n    }\n  ]\n}\n```', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
ERROR: Error getting clarifications: 1 validation error for ClarificationOutput
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n    "clarifi...       }\n    ]\n}\n```', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/json_invalid

My understanding is that by passing response_format=QuestionType (where QuestionType is a Pydantic model) to chat_openai_async, the code expects the LLM API to return a clean JSON string matching that model.But it failed,received plain text instead of JSON.

Maybe OpenAI compatible API( doesn't fully support forcing JSON output via the Pydantic response_format feature in the same way the official OpenAI API does? Or maybe the system prompt (CLASSIFY_QUESTION_SYSTEM_PROMPT) needs adjustment force JSON output ?

@Chapin2018
Copy link
Author

hi,need to upgrade to the latest openai-python.
openai/openai-python#1763

@Chapin2018
Copy link
Author

hi,need to upgrade to the latest openai-python. openai/openai-python#1763

still having problems.and i will try this https://github.com/instructor-ai/instructor

@Chapin2018
Copy link
Author

hi,need to upgrade to the latest openai-python. openai/openai-python#1763

still having problems.and i will try this instructor-ai/instructor

instructor is doing well.Making multi-agent more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants