Invalid JSON error in clarify question endpoints due to Markdown formatting #479

Chapin2018 · 2025-03-28T02:41:26Z

When using the default model, clarification requests frequently fail with the following error:

ERROR: Error getting clarifications: 1 validation error for ClarificationOutput
Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n  "clarifica... []\n    }\n  ]\n}\n```'

Root Cause:
The error occurs because the response contains a JSON string formatted within a Markdown code block , which is not correctly parsed before validation against ClarificationOutput.

Affected Endpoints:

POST /oracle/clarify_question
POST /query-data/clarify

The text was updated successfully, but these errors were encountered:

rishsriv · 2025-03-28T06:59:03Z

Hi there, thanks for reporting! Could you confirm that this is while using the default model (gpt-4o) with the base OpenAI API URL?

I can't seem to replicate this, and gpt-4o generally has excellent adherence to response formats. I suspect that there might be an issue with the underlying models/APIs being used.

rishsriv · 2025-03-28T07:46:56Z

We have also added exception handling in 2cc03cb, which should help with this. Would love to get more detail on this so we can understand the issue and implement a more robust solution!

Chapin2018 · 2025-03-29T14:06:02Z

Hi there,hope you're doing well!
I'm currently using OpenAI compatible API(might be a bit less robust).I've also slightly adjusted the flow to call chat_openai_async directly (bypassing the chat_async wrapper for now while testing this).
And the error still happen.

# Paste the full traceback here
File "/backend/query_data_routes.py", line 627, in get_question_type_route
    res = await classify_question_type(question)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/backend/utils_clarification.py", line 136, in classify_question_type
    response = await chat_openai_async(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/backend/defog/llm/utils.py", line 1161, in chat_openai_async
    response = await client_openai.beta.chat.completions.parse(**request_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/beta/chat/completions.py", line 437, in parse
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1564, in _request
    return await self._process_response(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1661, in _process_response
    return await api_response.parse()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_response.py", line 432, in parse
    parsed = self._options.post_parser(parsed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/beta/chat/completions.py", line 431, in parser
    return _parse_chat_completion(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py", line 110, in parse_chat_completion
    "parsed": maybe_parse_content(
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py", line 161, in maybe_parse_content
    return _parse_content(response_format, message.content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py", line 221, in _parse_content
    return cast(ResponseFormatT, model_parse_json(response_format, content))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_compat.py", line 169, in model_parse_json
    return model.model_validate_json(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 656, in model_validate_json
    return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for QuestionType
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value="The user's question seem... for data directly.\n\n", input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
ERROR: Error getting clarifications: 1 validation error for ClarificationOutput
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n  "clarifica... []\n    }\n  ]\n}\n```', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
ERROR: Error getting clarifications: 1 validation error for ClarificationOutput
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n    "clarifi...       }\n    ]\n}\n```', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/json_invalid

My understanding is that by passing response_format=QuestionType (where QuestionType is a Pydantic model) to chat_openai_async, the code expects the LLM API to return a clean JSON string matching that model.But it failed,received plain text instead of JSON.

Maybe OpenAI compatible API( doesn't fully support forcing JSON output via the Pydantic response_format feature in the same way the official OpenAI API does? Or maybe the system prompt (CLASSIFY_QUESTION_SYSTEM_PROMPT) needs adjustment force JSON output ?

Chapin2018 · 2025-04-18T05:10:15Z

hi,need to upgrade to the latest openai-python.
openai/openai-python#1763

Chapin2018 · 2025-04-18T12:52:55Z

hi,need to upgrade to the latest openai-python. openai/openai-python#1763

still having problems.and i will try this https://github.com/instructor-ai/instructor

Chapin2018 · 2025-04-23T03:21:39Z

hi,need to upgrade to the latest openai-python. openai/openai-python#1763

still having problems.and i will try this instructor-ai/instructor

instructor is doing well.Making multi-agent more reliable.

rishsriv mentioned this issue Mar 28, 2025

Better structured outputs handling defog-ai/defog-python#85

Merged

Chapin2018 closed this as completed Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid JSON error in clarify question endpoints due to Markdown formatting #479

Invalid JSON error in clarify question endpoints due to Markdown formatting #479

Chapin2018 commented Mar 28, 2025

rishsriv commented Mar 28, 2025

rishsriv commented Mar 28, 2025

Chapin2018 commented Mar 29, 2025 •

edited

Loading

Chapin2018 commented Apr 18, 2025

Chapin2018 commented Apr 18, 2025

Chapin2018 commented Apr 23, 2025

Invalid JSON error in clarify question endpoints due to Markdown formatting #479

Invalid JSON error in clarify question endpoints due to Markdown formatting #479

Comments

Chapin2018 commented Mar 28, 2025

rishsriv commented Mar 28, 2025

rishsriv commented Mar 28, 2025

Chapin2018 commented Mar 29, 2025 • edited Loading

Chapin2018 commented Apr 18, 2025

Chapin2018 commented Apr 18, 2025

Chapin2018 commented Apr 23, 2025

Chapin2018 commented Mar 29, 2025 •

edited

Loading