You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my case, because an agent can handle a request without calling a tooll (but it's not as good as using tool, I also cannot force the agent to use tool, as depend on request, for some of them the agent might not need to use tools) so I would like to evaluate the work flow (which agents are involved in multi-agent system) and which tools are called (in order) to know how well the system is designed.
Currently, In Pydantic Eval, I don't see it tracks this information. Is there anyway to do this? Also is this recommended?
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Hi @Kludex I think this is a little bit complicated, there might be multiple flows that can achieve an expected result.
So I think I would need the EvaluatorContext trace message history. Then I can use this data for a custom evaluator.
With the current implementation I think I will put message history in the output of the agent beside its actual output I guess.
Question
Hi,
In my case, because an agent can handle a request without calling a tooll (but it's not as good as using tool, I also cannot force the agent to use tool, as depend on request, for some of them the agent might not need to use tools) so I would like to evaluate the work flow (which agents are involved in multi-agent system) and which tools are called (in order) to know how well the system is designed.
Currently, In Pydantic Eval, I don't see it tracks this information. Is there anyway to do this? Also is this recommended?
Additional Context
No response
The text was updated successfully, but these errors were encountered: