hirzel

Martin Hirzel hirzel

7 followers · 0 following

http://hirzels.com/martin/

Achievements

Organizations

Stars

wala / graph4code

GraphGen4Code: a toolkit for creating code knowledge graphs based on WALA code analysis and extraction of documentation and forum content.

Jupyter Notebook 321 42 Updated Nov 19, 2024

wala / ML

Java 28 18 Updated Nov 4, 2025

cuga-project / cuga-agent

CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…

Python 182 18 Updated Nov 7, 2025

gbdrt / mu-ppl

A micro Python based probabilistic programming language

Jupyter Notebook 5 Updated Nov 3, 2025

codetlingua / codetlingua

Python 18 5 Updated Apr 15, 2024

eth-sri / type-constrained-code-generation

Reproduction Package for the paper "Type-Constrained Code Generation with Language Models" [PLDI 2025]

Python 77 3 Updated Jun 11, 2025

Ingkarat / PoTo

PoTo: A Hybrid Andersen's Points-to Analysis for Python

Python 3 Updated Jun 29, 2025

IBM / Issue-Test-Localizer

This repository is for Issue-Test-Localizer. An approach for localizing tests from issue descriptions

Python 2 1 Updated Sep 18, 2025

Asaf-Yehudai / LLM-Agent-Evaluation-Survey

Top papers related to LLM-based agent evaluation

86 12 Updated Oct 21, 2025

nuprl / MultiPL-E

A multi-programming language benchmark for LLMs

Python 279 51 Updated Aug 9, 2025

ibm-granite / granite-io

Python framework which enables you to transform how a user calls or infers an IBM Granite model and how the output from the model is returned to the user.

Python 48 27 Updated Nov 7, 2025

SWE-bench / experiments

Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

Shell 219 266 Updated Oct 21, 2025

IBM / TDD-Bench-Verified

TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)

Python 25 3 Updated Sep 18, 2025

plasma-umass / ChatDBG

ChatDBG - AI-assisted debugging. Uses AI to answer 'why'

Python 1,045 79 Updated Nov 5, 2025

rjust / defects4j

A Database of Real Faults and an Experimental Infrastructure to Enable Controlled Experiments in Software Engineering Research

Perl 892 350 Updated Oct 11, 2025

BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Python 30,796 4,625 Updated Nov 8, 2025