michalwols

Mike michalwols

193 followers · 1.9k following

New York
michal.io

Achievements

Starred repositories

allenai / olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Python 13,063 939 Updated Jun 28, 2025

r-three / common-pile

Code for collecting, processing, and preparing datasets for the Common Pile

Python 174 15 Updated Jun 16, 2025

sayakpaul / nanoDiT

Just another reasonably minimal repo for class-conditional training of pixel-space diffusion transformers.

Python 106 11 Updated May 29, 2025

edsonroteia / cav-mae-sync

[CVPR25] Official Implementation of CAV-MAE Sync

Python 20 2 Updated Jun 18, 2025

resemble-ai / chatterbox

SoTA open-source TTS

Python 8,848 994 Updated Jun 13, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,145 54 Updated Jun 13, 2025

Junhua-Liao / LR-ASD

The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)

Python 39 8 Updated Mar 23, 2025

Junhua-Liao / Light-ASD

The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)

Python 143 16 Updated Mar 23, 2025

LINs-lab / UCGM

[Preprint] UCGM: Unified Continuous Generative Models

Python 156 7 Updated May 27, 2025

xjchenGit / MTDVocaLiST

Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).

Python 24 1 Updated Apr 3, 2024

yifan123 / flow_grpo

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 796 29 Updated Jun 16, 2025

antonibigata / keysync

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Jupyter Notebook 325 32 Updated Jun 18, 2025

zhuang2002 / Cobra

[SIGGRAPH 2025] Official code of the paper "Cobra: Efficient Line Art COlorization with BRoAder References"

Python 194 13 Updated Apr 17, 2025

KaiyueSun98 / T2I-Personalization-with-AR

43 1 Updated Apr 20, 2025

Thorin215 / FocusedAD

Repo of FocusedAD

Python 12 3 Updated Apr 18, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,881 260 Updated Jun 21, 2025

stanford-futuredata / colbert-serve

Python 18 Updated May 30, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,325 193 Updated Jun 17, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,222 1,409 Updated Jun 28, 2025

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,344 76 Updated May 28, 2025

NVlabs / T-Stitch

[ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching"

Jupyter Notebook 103 2 Updated Feb 26, 2024

Tencent-Hunyuan / InstantCharacter

Python 1,003 88 Updated May 14, 2025

lllyasviel / FramePack

Lets make video diffusion practical!

Python 14,744 1,326 Updated Jun 27, 2025

LINs-lab / ERW

[Preprint] Efficient Generative Model Training via Embedded Representation Warmup

Python 30 2 Updated Apr 16, 2025

wdrink / SimpleAR

Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"

Python 374 20 Updated Jun 20, 2025

Hoar012 / TDC-Video

Official implementation of TDC.

Python 7 Updated Jun 28, 2025

FoundationVision / Liquid

Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python 594 34 Updated Apr 8, 2025

embeddings-benchmark / miebpaper

Jupyter Notebook 2 Updated Feb 28, 2025

yandex-research / tabm

(ICLR 2025) TabM: Advancing Tabular Deep Learning With Parameter-Efficient Ensembling

Python 414 35 Updated Jun 27, 2025

THU-MIG / lsnet

LSNet: See Large, Focus Small [CVPR 2025]

Python 177 8 Updated Apr 1, 2025

Mike michalwols

Starred repositories

retail-data

state-of-the-art