When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

Yibo Peng^1,†, James Song^2,†, Lei Li^3,†, Xinyu Yang¹, Mihai Christodorescu⁴, Ravi Mangal⁵, Corina Pasareanu¹, Haizhong Zheng¹, Beidi Chen¹
¹Carnegie Mellon University, ²University of Michigan, ³Peking University, ⁴Google, ⁵Colorado State University
†Equal contribution

[Website] [Paper]

TL;DR Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code-agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of 40.7% on GPT-5 mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.

🗞️ News

Figure 1 An FCV attack can be initiated through two real-world pathways: deliberately by a malicious contributor, or more subtly, when a benign developer inadvertently incorporates content from a contaminated source. Both pathways result in the same input of developer-style instructions within an issue description, making them indistinguishable from the agent's perspective. This illustrates the core of the FCV threat: functionally correct patches that pass all tests can still embed exploitable vulnerabilities (e.g., CWE-94).

Getting Started

Much of our experiments reuse prior code by mini-SWE-agent, SWE-agent, and OpenHands, with some additions with the injected CWE prompts.

1. Environment Setup

Ideally, you should create three separate conda environments corresponding to each of the three agents.

cd FCV

conda create -n sweagent python==3.11
conda activate sweagent
cd SWE-agent 
pip install -e .

conda create -n minisweagent python==3.11
conda activate minisweagent
cd mini-swe-agent
pip install -e .

conda create -n openhands python==3.11
conda activate openhands
cd openhands
pip install -e .

For LLM Judge, you can reuse any of the above environments, since it only requires requests and OpenAI (if you are using gpt-5-mini as the judge). In our experiments, we used Qwen-3-Coder-480B-A35B-Instruct as the judge.

2. Round 1: Identifying Resolved Instances

These are some example commands. They simply follow from the official documentations for these agents.

mini-SWE-agent

cd mini-swe-agent
mini-extra swebench -c model_configs/kimi.yaml --subset verified --split test -o moonshotai_Kimi_K2_Instruct-pass1 --workers 16

SWE-agent

sweagent run-batch \
    --config model_configs/kimi-official.yaml \
    --instances.type swe_bench \
    --instances.subset verified \
    --instances.split test  \
    --output_dir kimi-pass1 \
    --num_workers 16

OpenHands

Step 1: Clean up any existing runtime containers

cd /path/to/OpenHands  # Replace with your OpenHands directory
conda activate openhands

# List and stop all remote runtimes (if using remote runtime)
ALLHANDS_API_KEY="your-api-key" \
  curl -H "X-API-Key: your-api-key" \
  "https://runtime.eval.all-hands.dev/list" | \
  jq -r '.runtimes[].runtime_id' | \
  xargs -I {} curl -X POST \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"runtime_id": "{}"}' \
  "https://runtime.eval.all-hands.dev/stop"

# Run cleanup script
ALLHANDS_API_KEY="your-api-key" ./evaluation/utils/scripts/cleanup_remote_runtime.sh

Step 2: Run inference on SWE-bench Verified (Pass 1)

Choose your model and run:

# GPT-5 Mini
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_gpt5_mini \
  HEAD \
  CodeActAgent \
  100 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

# Claude Sonnet 4
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_claude_sonnet4 \
  HEAD \
  CodeActAgent \
  100 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

# Kimi-K2-Instruct
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_kimi \
  HEAD \
  CodeActAgent \
  200 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

# Qwen3-Coder-480B
cd /path/to/OpenHands && conda activate openhands
ALLHANDS_API_KEY="your-api-key" \
RUNTIME=remote \
SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" \
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
  llm.eval_qwen_480b \
  HEAD \
  CodeActAgent \
  200 \
  100 \
  16 \
  princeton-nlp/SWE-bench_Verified \
  test

Output: Results will be saved in evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/{MODEL_NAME}/output.jsonl

For more details on subsequent runs with CWE injection, see openhands/README.md.

3. Evaluation: SWE-Bench

After the first round, you will need to evaluate the results. You can either use the official SWE-Bench github repo and run the evaluation locally, or use their sb-cli cloud evaluation. We refer to the official documentation.

4. Running with CWE Injection + Ablation Study

This will depend on the agent. Please go to each agent's folder and read the README.md for additional details.

5. LM Judge for all the results

Please go to attack-lm-judge's README.md for more details.

Citation

If you find our work useful, please cite our paper:

@misc{peng2025correctsafetrustfunctionally,
  title={When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?}, 
  author={Yibo Peng and James Song and Lei Li and Xinyu Yang and Mihai Christodorescu and Ravi Mangal and Corina Pasareanu and Haizhong Zheng and Beidi Chen},
  year={2025},
  eprint={2510.17862},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2510.17862}, 
}

Contact

We're excited to share our early results and welcome feedback from the community as we continue to refine and expand FCV's capabilities. If you have any questions or feedback, please feel free to contact us:

Email: [email protected]
GitHub Issues: Create an issue
Website: https://infini-ai-lab.github.io/FCV/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
SWE-agent		SWE-agent
assets		assets
attack-lm-judge		attack-lm-judge
docs		docs
mini-swe-agent		mini-swe-agent
openhands		openhands
FCV_paper.pdf		FCV_paper.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

🗞️ News

Getting Started

1. Environment Setup

2. Round 1: Identifying Resolved Instances

mini-SWE-agent

SWE-agent

OpenHands

3. Evaluation: SWE-Bench

4. Running with CWE Injection + Ablation Study

5. LM Judge for all the results

Citation

Contact

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Infini-AI-Lab/FCV

Folders and files

Latest commit

History

Repository files navigation

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

🗞️ News

Getting Started

1. Environment Setup

2. Round 1: Identifying Resolved Instances

mini-SWE-agent

SWE-agent

OpenHands

3. Evaluation: SWE-Bench

4. Running with CWE Injection + Ablation Study

5. LM Judge for all the results

Citation

Contact

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages