Extra details returned by the model is not tracked by Usage class #1579

ThachNgocTran · 2025-04-24T09:24:15Z

Initial Checks

I confirm that I'm using the latest version of Pydantic AI
I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

Description

After a LLM finishes a call (request), it also returns some statistics, for example, number of prompt tokens, completion token. This is tracked by the class Usage (pydantic-ai/pydantic_ai_slim/pydantic_ai/usage.py (link))

According to the class' Documentation (link), as of 04.2025, the attribute details should contain "any extra details returned by the model." But this is not the case with the latest version of Pydantic AI (version 0.1.3).

I use llama-server (part of llama-cpp) as the backend to host a LLM model (in form of GGUF file). Using tcpflow (link) to capture the communication between Server and Client, I can see the last messsage sent from the Server as followed:

{
    "choices":[
        {
            "finish_reason":"stop",
            "index":0,
            "delta":{
                
            }
        }
    ],
    "created":1745457407,
    "id":"chatcmpl-G7Hmg3VGIPYO6hFk6nw7b4VtC4vqyliz",
    "model":"Qwen2.5-7B-Instruct-1M-q4_k_m-Finetuned",
    "system_fingerprint":"b5127-e959d32b",
    "object":"chat.completion.chunk",
    "usage":{
        "completion_tokens":32,
        "prompt_tokens":52,
        "total_tokens":84
    },
    "timings":{
        "prompt_n":17,
        "prompt_ms":2470.602,
        "prompt_per_token_ms":145.3295294117647,
        "prompt_per_second":6.880914044431277,
        "predicted_n":32,
        "predicted_ms":6924.775,
        "predicted_per_token_ms":216.39921875,
        "predicted_per_second":4.621088771837353
    }
}

The completion_tokens and prompt_tokens are well-captured by the Usage class (respectively, response_tokens and request_tokens). But all about the time taken to process, e.g. prompt_ms, prompt_per_token_ms are missed in the field details of Usage class. Unless I am mistaken, the field should contain any extra details returned by the model.

Expectation

The field details of Usage class should contain any extra details returned by the model, e.g. timings or prompt_per_token_ms.

Example Code

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
...
agent  = Agent(
            OpenAIModel(
                "model_name",
                provider=OpenAIProvider(
                    api_key=os.environ["LLM_API_KEY"],
                    base_url=f"{LLM_URL}:8081/v1",
                    http_client=AsyncClient(headers={"Connection": "close"}),
                ),
            ),
            retries=3,
            deps_type=str,
        )
...
async with agent.run_stream(
            latest_user_message,
            message_history=message_history,
            deps=system_prompt,
        ) as result:
            async for chunk in result.stream_text(delta=True):
                writer(chunk)

print(result.usage())

Python, Pydantic AI & LLM client version

+ Windows 11, WSL2, Ubuntu 24.04
+ Pydantic AI v0.1.3
+ Python v3.12.7
+ llama-cli 5117
+ Langchain Core 0.3.49

The text was updated successfully, but these errors were encountered:

DouweM · 2025-04-25T16:31:33Z

@ThachNgocTran I agree it would be useful to store these timings and let you access them, but this is not what Usage is meant for: it's specifically for measuring token consumption, and usage across multiple LLM calls is summed, which wouldn't be appropriate for timings. The details field does indeed exist to store "any extra details returned by the model", but this doesn't happen automatically, the model class still has to explicitly build it from the fields on the API response -- which fields it uses just changes from one model to another.

Since the official OpenAI API doesn't have the timings field and I don't think the other big 2 (Anthropic, Gemini) do either, we haven't considered officially supporting storing timings yet.

Would it be an option to subclass OpenAIModel to pull these fields from the response manually?

ThachNgocTran · 2025-04-27T19:25:31Z

@DouweM Thank you for your repsonse. 🙏

Because different models return different sets of "extra details" (not a unified standard), it'd be best to extend the class OpenAIModel when needed. Here is the classs OpenAIModel. I can try to find a place to populate timings into class Usage.

I'm not super familiar with the source code. Where do you think is best to override a function that extracts extra details of LLM response to fill up the field details in Usage? Then I can take it from there. 🤗

DouweM · 2025-04-28T18:40:54Z

@ThachNgocTran Unfortunately Usage doesn't look like a good place for this, because it expects all values to be ints and automatically sums them. That wouldn't be appropriate for e.g. the prompt_per_token_ms timing, which is the result of calculating prompt_ms/prompt_n. If we sum these values across requests, we wouldn't get the actual average speed, but the nonsensical sum of all speeds.

In #1549, we're storing additional (int) details from the Anthropic API, but it wouldn't make sense to do that on the OpenAIModel because the official OpenAI API doesn't return timings.

I'm afraid that until one of the official APIs returns timings, we can't support this natively.

Note that if you use our observability platform https://pydantic.dev/logfire with logfire.instrument_httpx(capture_all=True) in your code, the full HTTP response, including these timings, will end up in Logfire and can be queried using SQL. Would that be sufficient?

ThachNgocTran · 2025-04-30T12:29:06Z

@DouweM To sum up, the best strategy for my use-case:

Create another MyOpenAIModel2 (extending OpenAIModel).
In the new class, I override certain function so that the Usage class stores whatever extra details, as in issue Store additional usage details from Anthropic #1549.

Through this way, I don't need any modification of code from pydantic-ai side.

What do you think? 😊

DouweM · 2025-04-30T21:13:44Z

@ThachNgocTran That's a good route to take for now. In #1238, we're looking at adding a vendor_metadata dict that can contain additional values -- once that lands, we'd welcome a PR to add these properties for OpenAI!

ThachNgocTran added the need confirmation label Apr 24, 2025

DouweM self-assigned this Apr 25, 2025

DouweM closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2025

DouweM mentioned this issue Apr 30, 2025

feat: adds logprobs to OpenAi response #1238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra details returned by the model is not tracked by Usage class #1579

Extra details returned by the model is not tracked by Usage class #1579

ThachNgocTran commented Apr 24, 2025

DouweM commented Apr 25, 2025

ThachNgocTran commented Apr 27, 2025

DouweM commented Apr 28, 2025

ThachNgocTran commented Apr 30, 2025

DouweM commented Apr 30, 2025

Extra details returned by the model is not tracked by Usage class #1579

Extra details returned by the model is not tracked by Usage class #1579

Comments

ThachNgocTran commented Apr 24, 2025

Initial Checks

Description

Description

Expectation

Example Code

Python, Pydantic AI & LLM client version

DouweM commented Apr 25, 2025

ThachNgocTran commented Apr 27, 2025

DouweM commented Apr 28, 2025

ThachNgocTran commented Apr 30, 2025

DouweM commented Apr 30, 2025