Skip to content

Extra details returned by the model is not tracked by Usage class #1579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
ThachNgocTran opened this issue Apr 24, 2025 · 5 comments
Closed
2 tasks done
Assignees

Comments

@ThachNgocTran
Copy link

Initial Checks

Description

Description

After a LLM finishes a call (request), it also returns some statistics, for example, number of prompt tokens, completion token. This is tracked by the class Usage (pydantic-ai/pydantic_ai_slim/pydantic_ai/usage.py (link))

According to the class' Documentation (link), as of 04.2025, the attribute details should contain "any extra details returned by the model." But this is not the case with the latest version of Pydantic AI (version 0.1.3).

I use llama-server (part of llama-cpp) as the backend to host a LLM model (in form of GGUF file). Using tcpflow (link) to capture the communication between Server and Client, I can see the last messsage sent from the Server as followed:

{
    "choices":[
        {
            "finish_reason":"stop",
            "index":0,
            "delta":{
                
            }
        }
    ],
    "created":1745457407,
    "id":"chatcmpl-G7Hmg3VGIPYO6hFk6nw7b4VtC4vqyliz",
    "model":"Qwen2.5-7B-Instruct-1M-q4_k_m-Finetuned",
    "system_fingerprint":"b5127-e959d32b",
    "object":"chat.completion.chunk",
    "usage":{
        "completion_tokens":32,
        "prompt_tokens":52,
        "total_tokens":84
    },
    "timings":{
        "prompt_n":17,
        "prompt_ms":2470.602,
        "prompt_per_token_ms":145.3295294117647,
        "prompt_per_second":6.880914044431277,
        "predicted_n":32,
        "predicted_ms":6924.775,
        "predicted_per_token_ms":216.39921875,
        "predicted_per_second":4.621088771837353
    }
}

The completion_tokens and prompt_tokens are well-captured by the Usage class (respectively, response_tokens and request_tokens). But all about the time taken to process, e.g. prompt_ms, prompt_per_token_ms are missed in the field details of Usage class. Unless I am mistaken, the field should contain any extra details returned by the model.

Expectation

The field details of Usage class should contain any extra details returned by the model, e.g. timings or prompt_per_token_ms.

Example Code

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
...
agent  = Agent(
            OpenAIModel(
                "model_name",
                provider=OpenAIProvider(
                    api_key=os.environ["LLM_API_KEY"],
                    base_url=f"{LLM_URL}:8081/v1",
                    http_client=AsyncClient(headers={"Connection": "close"}),
                ),
            ),
            retries=3,
            deps_type=str,
        )
...
async with agent.run_stream(
            latest_user_message,
            message_history=message_history,
            deps=system_prompt,
        ) as result:
            async for chunk in result.stream_text(delta=True):
                writer(chunk)

print(result.usage())

Python, Pydantic AI & LLM client version

+ Windows 11, WSL2, Ubuntu 24.04
+ Pydantic AI v0.1.3
+ Python v3.12.7
+ llama-cli 5117
+ Langchain Core 0.3.49
@DouweM
Copy link
Contributor

DouweM commented Apr 25, 2025

@ThachNgocTran I agree it would be useful to store these timings and let you access them, but this is not what Usage is meant for: it's specifically for measuring token consumption, and usage across multiple LLM calls is summed, which wouldn't be appropriate for timings. The details field does indeed exist to store "any extra details returned by the model", but this doesn't happen automatically, the model class still has to explicitly build it from the fields on the API response -- which fields it uses just changes from one model to another.

Since the official OpenAI API doesn't have the timings field and I don't think the other big 2 (Anthropic, Gemini) do either, we haven't considered officially supporting storing timings yet.

Would it be an option to subclass OpenAIModel to pull these fields from the response manually?

@DouweM DouweM self-assigned this Apr 25, 2025
@ThachNgocTran
Copy link
Author

@DouweM Thank you for your repsonse. 🙏

Because different models return different sets of "extra details" (not a unified standard), it'd be best to extend the class OpenAIModel when needed. Here is the classs OpenAIModel. I can try to find a place to populate timings into class Usage.

I'm not super familiar with the source code. Where do you think is best to override a function that extracts extra details of LLM response to fill up the field details in Usage? Then I can take it from there. 🤗

@DouweM
Copy link
Contributor

DouweM commented Apr 28, 2025

@ThachNgocTran Unfortunately Usage doesn't look like a good place for this, because it expects all values to be ints and automatically sums them. That wouldn't be appropriate for e.g. the prompt_per_token_ms timing, which is the result of calculating prompt_ms/prompt_n. If we sum these values across requests, we wouldn't get the actual average speed, but the nonsensical sum of all speeds.

In #1549, we're storing additional (int) details from the Anthropic API, but it wouldn't make sense to do that on the OpenAIModel because the official OpenAI API doesn't return timings.

I'm afraid that until one of the official APIs returns timings, we can't support this natively.

Note that if you use our observability platform https://pydantic.dev/logfire with logfire.instrument_httpx(capture_all=True) in your code, the full HTTP response, including these timings, will end up in Logfire and can be queried using SQL. Would that be sufficient?

@ThachNgocTran
Copy link
Author

@DouweM To sum up, the best strategy for my use-case:

Through this way, I don't need any modification of code from pydantic-ai side.

What do you think? 😊

@DouweM
Copy link
Contributor

DouweM commented Apr 30, 2025

@ThachNgocTran That's a good route to take for now. In #1238, we're looking at adding a vendor_metadata dict that can contain additional values -- once that lands, we'd welcome a PR to add these properties for OpenAI!

@DouweM DouweM closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants