Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,244 253 Updated Jun 12, 2025

cloneofsimo / minRF

Minimal implementation of scalable rectified flow transformers, based on SD3's approach

Jupyter Notebook 593 52 Updated Jul 1, 2024

huggingface / nanoVLM

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 3,641 329 Updated Jun 30, 2025

LecturaLabsAI / professor-ai-feynman

A nascent multi-agent tool for learning anything the feynman way (Microsoft AI Agent Hackathon Submission)

Python 2 Updated May 21, 2025

maitrix-org / Voila

Python 419 40 Updated May 6, 2025

huggingface / picotron

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,565 107 Updated Jun 2, 2025

going-doer / Paper2Code

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Python 2,811 417 Updated May 16, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,301 1,422 Updated Jun 28, 2025

policy-gradient / GRPO-Zero

Implementing DeepSeek R1's GRPO algorithm from scratch

Python 1,457 67 Updated Apr 18, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 41,128 5,325 Updated Aug 16, 2024

test-time-training / ttt-video-dit

Official PyTorch implementation of One-Minute Video Generation with Test-Time Training

Python 1,728 135 Updated Jun 5, 2025

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,009 49 Updated Jul 1, 2025

HKUNLP / Dream

Dream 7B, a large diffusion language model

Python 799 39 Updated Jun 18, 2025

yhy258 / EIDL_DRMI

Python 18 3 Updated Aug 13, 2024

manycore-research / SpatialLM

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 3,453 258 Updated Jun 24, 2025

computerhistory / AlexNet-Source-Code

This package contains the original 2012 AlexNet code.

Cuda 2,667 348 Updated Mar 12, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,243 321 Updated Jun 26, 2025

KempnerInstitute / traveling-waves-integrate

Repository to create traveling waves integrate special information through time

Jupyter Notebook 53 5 Updated Mar 7, 2025

TIGER-AI-Lab / TheoremExplainAgent

Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025 oral]

Python 1,321 164 Updated Jun 25, 2025

Firebase

Docker

React

JavaScript

Java

iOS

Gulp

Django

Deep learning

Database

See all starred topics