More
More
-
AgentBench Public
Forked from THUDM/AgentBenchA Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Python Apache License 2.0 UpdatedMar 7, 2025 -
agentdojo Public
Forked from ethz-spylab/agentdojoA Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Python MIT License UpdatedOct 13, 2025 -
AgentDojoOld Public
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Python MIT License UpdatedOct 3, 2025 -
-
-
R-Judge Public
Forked from Lordog/R-JudgeR-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
Python UpdatedFeb 26, 2025 -
sragent Public
Forked from openai/openai-agents-pythonA lightweight, powerful framework for multi-agent workflows
Python MIT License UpdatedSep 8, 2025 -
ToolEmu Public
Forked from ryoungj/ToolEmu[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Python Apache License 2.0 UpdatedMar 3, 2025