Highlights
- Pro
Stars
Data and sample evaluation codes for Multimodal Rewardbench 2
The official code of "VisCoder2: Building Multi-Language Visualization Coding Agents"
Multimodal Large Language Models for Code Generation under Multimodal Scenarios
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide better code/research plans 🧰 OpenAI, Anthropic, Gemini, Ollam…
Supercharge Your LLM with the Fastest KV Cache Layer
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
The first Interleaved framework for textual reasoning within the visual generation process
This is the github to open source benchmark AdvancedIF, see LAMA L1387358RCRO
Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
Project Page for "Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following"
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
PhD/MBA-level human-annotated rubrics dataset across Physics, Chemistry, Finance and Consulting
PromSketch: Approximation-First Timeseries Query at Scale
Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling
Fully Open Framework for Democratized Multimodal Training
VLAC: A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"
Distributed Compiler based on Triton for Parallel Systems


