Project 26 696DS
Popular repositories Loading
-
AgentBench
AgentBench PublicForked from THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Python
-
da-code
da-code PublicForked from yiyihum/da-code
[EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Python
-
agent-as-a-judge
agent-as-a-judge PublicForked from metauto-ai/agent-as-a-judge
⚖️ The First Coding Agent-as-a-Judge
Python
Repositories
- agent-as-a-judge Public Forked from metauto-ai/agent-as-a-judge
⚖️ The First Coding Agent-as-a-Judge
Project-26-696DS/agent-as-a-judge’s past year of commit activity - AgentBench Public Forked from THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Project-26-696DS/AgentBench’s past year of commit activity - da-code Public Forked from yiyihum/da-code
[EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Project-26-696DS/da-code’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…