QiangYu qiang-yu

AgentBench Public
Forked from THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python Apache License 2.0 Updated Mar 7, 2025
agentdojo Public
Forked from ethz-spylab/agentdojo

A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

Python MIT License Updated Oct 13, 2025
AgentDojoOld Public

A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

Python MIT License Updated Oct 3, 2025
Binder_Python Public

Jupyter Notebook MIT License Updated Feb 19, 2023
InjecAgent Public
Forked from uiuc-kang-lab/InjecAgent

Python MIT License Updated Feb 17, 2025
R-Judge Public
Forked from Lordog/R-Judge

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)

Python Updated Feb 26, 2025
sragent Public
Forked from openai/openai-agents-python

A lightweight, powerful framework for multi-agent workflows

Python MIT License Updated Sep 8, 2025
ToolEmu Public
Forked from ryoungj/ToolEmu

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

Python Apache License 2.0 Updated Mar 3, 2025