Skip to content
View hirzel's full-sized avatar

Organizations

@IBM

Block or report hirzel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…

Python 182 18 Updated Nov 6, 2025

A micro Python based probabilistic programming language

Jupyter Notebook 5 Updated Nov 3, 2025
Python 18 5 Updated Apr 15, 2024

Reproduction Package for the paper "Type-Constrained Code Generation with Language Models" [PLDI 2025]

Python 77 3 Updated Jun 11, 2025

PoTo: A Hybrid Andersen's Points-to Analysis for Python

Python 3 Updated Jun 29, 2025

This repository is for Issue-Test-Localizer. An approach for localizing tests from issue descriptions

Python 2 1 Updated Sep 18, 2025

Top papers related to LLM-based agent evaluation

86 12 Updated Oct 21, 2025

A multi-programming language benchmark for LLMs

Python 279 51 Updated Aug 9, 2025

Python framework which enables you to transform how a user calls or infers an IBM Granite model and how the output from the model is returned to the user.

Python 48 27 Updated Nov 6, 2025

Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

Shell 219 266 Updated Oct 21, 2025

TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)

Python 25 3 Updated Sep 18, 2025

ChatDBG - AI-assisted debugging. Uses AI to answer 'why'

Python 1,044 79 Updated Nov 5, 2025

A Database of Real Faults and an Experimental Infrastructure to Enable Controlled Experiments in Software Engineering Research

Perl 893 350 Updated Oct 11, 2025

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Python 30,760 4,613 Updated Nov 7, 2025

Agentless🐱: an agentless approach to automatically solve software development problems

Python 1,948 213 Updated Dec 22, 2024

Prompt Declaration Language (PDL) is a declarative prompt programming language.

Python 244 42 Updated Nov 7, 2025

Static Python call graph generator

Python 358 72 Updated Nov 26, 2023

Build production-ready AI agents in both Python and Typescript.

Python 2,939 387 Updated Nov 6, 2025

The official Python SDK for Codellm-Devkit

Python 116 28 Updated Sep 5, 2025

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 3,760 678 Updated Oct 11, 2025

KubeStellar - a flexible solution for multi-cluster configuration management for edge, multi-cloud, and hybrid cloud

Go 581 203 Updated Nov 6, 2025

A language for constraint-guided and efficient LLM programming.

Python 4,076 214 Updated May 22, 2025

tempeh is a framework to TEst Machine learning PErformance exHaustively which includes tracking memory usage and run time.

Python 18 5 Updated Jan 3, 2022

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Python 7,054 1,320 Updated Aug 26, 2025

A library of sklearn compatible categorical variable encoders

Python 2,467 404 Updated Nov 2, 2025

Home of CodeT5: Open Code LLMs for Code Understanding and Generation

Python 3,080 485 Updated Jan 20, 2024

AutoML debugging and remediation tool called MARO: ML Automated Remediation Oracle

Python 2 Updated Jun 21, 2022

A library for read and write ARFF files in Python

Python 101 50 Updated May 31, 2023

Scalpel: The Python Static Analysis Framework

Python 320 44 Updated Mar 28, 2024
Next