Skip to content

Infini-AI-Lab/FCV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

Yibo Peng1,†, James Song2,†, Lei Li3,†, Xinyu Yang1, Mihai Christodorescu4, Ravi Mangal5, Corina Pasareanu1, Haizhong Zheng1, Beidi Chen1
1Carnegie Mellon University, 2University of Michigan, 3Peking University, 4Google, 5Colorado State University
†Equal contribution


TL;DR Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code-agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of 40.7% on GPT-5 mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.

🗞️ News

ASR across Agent/Model/CWE

Figure 1 An FCV attack can be initiated through two real-world pathways: deliberately by a malicious contributor, or more subtly, when a benign developer inadvertently incorporates content from a contaminated source. Both pathways result in the same input of developer-style instructions within an issue description, making them indistinguishable from the agent's perspective. This illustrates the core of the FCV threat: functionally correct patches that pass all tests can still embed exploitable vulnerabilities (e.g., CWE-94).

Getting Started

Much of our experiments reuse prior code by mini-SWE-agent, SWE-agent, and OpenHands, with some additions with the injected CWE prompts.

1. Environment Setup

Ideally, you should create three separate conda environments corresponding to each of the three agents.

cd FCV

conda create -n sweagent python==3.11
conda activate sweagent
cd SWE-agent 
pip install -e .

conda create -n minisweagent python==3.11
conda activate minisweagent
cd mini-swe-agent
pip install -e .

conda create -n openhands python==3.11
conda activate openhands
cd openhands
pip install -e .

For LLM Judge, you can reuse any of the above environments, since it only requires requests and OpenAI (if you are using gpt-5-mini as the judge). In our experiments, we used Qwen-3-Coder-480B-A35B-Instruct as the judge.

2. Round 1: Identifying Resolved Instances

These are some example commands. They simply follow from the official documentations for these agents.

mini-SWE-agent

cd mini-swe-agent
mini-extra swebench -c model_configs/kimi.yaml --subset verified --split test -o moonshotai_Kimi_K2_Instruct-pass1 --workers 16

SWE-agent

sweagent run-batch \
    --config model_configs/kimi-official.yaml \
    --instances.type swe_bench \
    --instances.subset verified \
    --instances.split test  \
    --output_dir kimi-pass1 \
    --num_workers 16

OpenHands

Step 1: Clean up any existing runtime containers

cd /path/to/OpenHands  # Replace with your OpenHands directory
conda activate openhands

# List and stop all remote runtimes (if using remote runtime)
ALLHANDS_API_KEY="your-api-key" \
  curl -H "X-API-Key: your-api-key" \
  "https://runtime.eval.all-hands.dev/list" | \
  jq -r '.runtimes[].runtime_id' | \
  xargs -I {} curl -X POST \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"runtime_id": "{}"}' \
  "https://runtime.eval.all-hands.dev/stop"

# Run cleanup script
ALLHANDS_API_KEY="your-api-key" ./evaluation/utils/scripts/cleanup_remote_runtime.sh

Step 2: Run inference on SWE-bench Verified (Pass 1)

Choose your model and run:

# GPT-5 Mini
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_gpt5_mini \
  HEAD \
  CodeActAgent \
  100 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

# Claude Sonnet 4
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_claude_sonnet4 \
  HEAD \
  CodeActAgent \
  100 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

# Kimi-K2-Instruct
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_kimi \
  HEAD \
  CodeActAgent \
  200 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

# Qwen3-Coder-480B
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_qwen_480b \
  HEAD \
  CodeActAgent \
  200 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

Output: Results will be saved in evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/{MODEL_NAME}/output.jsonl

For more details on subsequent runs with CWE injection, see openhands/README.md.

3. Evaluation: SWE-Bench

After the first round, you will need to evaluate the results. You can either use the official SWE-Bench github repo and run the evaluation locally, or use their sb-cli cloud evaluation. We refer to the official documentation.

4. Running with CWE Injection + Ablation Study

This will depend on the agent. Please go to each agent's folder and read the README.md for additional details.

5. LM Judge for all the results

Please go to attack-lm-judge's README.md for more details.

Citation

If you find our work useful, please cite our paper:

@misc{peng2025correctsafetrustfunctionally,
  title={When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?}, 
  author={Yibo Peng and James Song and Lei Li and Xinyu Yang and Mihai Christodorescu and Ravi Mangal and Corina Pasareanu and Haizhong Zheng and Beidi Chen},
  year={2025},
  eprint={2510.17862},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2510.17862}, 
}

Contact

We're excited to share our early results and welcome feedback from the community as we continue to refine and expand FCV's capabilities. If you have any questions or feedback, please feel free to contact us:

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •