Skip to content
@Project-26-696DS

Project 26 696DS

For Spring 2025

Popular repositories Loading

  1. AgentBench AgentBench Public

    Forked from THUDM/AgentBench

    A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

    Python

  2. da-code da-code Public

    Forked from yiyihum/da-code

    [EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

    Python

  3. agent-as-a-judge agent-as-a-judge Public

    Forked from metauto-ai/agent-as-a-judge

    ⚖️ The First Coding Agent-as-a-Judge

    Python

Repositories

Showing 3 of 3 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…