Stars
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…
A micro Python based probabilistic programming language
Reproduction Package for the paper "Type-Constrained Code Generation with Language Models" [PLDI 2025]
PoTo: A Hybrid Andersen's Points-to Analysis for Python
This repository is for Issue-Test-Localizer. An approach for localizing tests from issue descriptions
Top papers related to LLM-based agent evaluation
Python framework which enables you to transform how a user calls or infers an IBM Granite model and how the output from the model is returned to the user.
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)
ChatDBG - AI-assisted debugging. Uses AI to answer 'why'
A Database of Real Faults and an Experimental Infrastructure to Enable Controlled Experiments in Software Engineering Research
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Agentless🐱: an agentless approach to automatically solve software development problems
Prompt Declaration Language (PDL) is a declarative prompt programming language.
Build production-ready AI agents in both Python and Typescript.
The official Python SDK for Codellm-Devkit
SWE-bench: Can Language Models Resolve Real-world Github Issues?
KubeStellar - a flexible solution for multi-cluster configuration management for edge, multi-cloud, and hybrid cloud
A language for constraint-guided and efficient LLM programming.
tempeh is a framework to TEst Machine learning PErformance exHaustively which includes tracking memory usage and run time.
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
A library of sklearn compatible categorical variable encoders
Home of CodeT5: Open Code LLMs for Code Understanding and Generation
AutoML debugging and remediation tool called MARO: ML Automated Remediation Oracle
A library for read and write ARFF files in Python


