Skip to content
View michalwols's full-sized avatar

Block or report michalwols

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Toolkit for linearizing PDFs for LLM datasets/training

Python 13,063 939 Updated Jun 28, 2025

Code for collecting, processing, and preparing datasets for the Common Pile

Python 174 15 Updated Jun 16, 2025

Just another reasonably minimal repo for class-conditional training of pixel-space diffusion transformers.

Python 106 11 Updated May 29, 2025

[CVPR25] Official Implementation of CAV-MAE Sync

Python 20 2 Updated Jun 18, 2025

SoTA open-source TTS

Python 8,848 994 Updated Jun 13, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,145 54 Updated Jun 13, 2025

The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)

Python 39 8 Updated Mar 23, 2025

The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)

Python 143 16 Updated Mar 23, 2025

[Preprint] UCGM: Unified Continuous Generative Models

Python 156 7 Updated May 27, 2025

Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).

Python 24 1 Updated Apr 3, 2024

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 796 29 Updated Jun 16, 2025

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Jupyter Notebook 325 32 Updated Jun 18, 2025

[SIGGRAPH 2025] Official code of the paper "Cobra: Efficient Line Art COlorization with BRoAder References"

Python 194 13 Updated Apr 17, 2025

Repo of FocusedAD

Python 12 3 Updated Apr 18, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,881 260 Updated Jun 21, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,325 193 Updated Jun 17, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,222 1,409 Updated Jun 28, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,344 76 Updated May 28, 2025

[ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching"

Jupyter Notebook 103 2 Updated Feb 26, 2024

Lets make video diffusion practical!

Python 14,744 1,326 Updated Jun 27, 2025

[Preprint] Efficient Generative Model Training via Embedded Representation Warmup

Python 30 2 Updated Apr 16, 2025

Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"

Python 374 20 Updated Jun 20, 2025

Official implementation of TDC.

Python 7 Updated Jun 28, 2025

Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python 594 34 Updated Apr 8, 2025
Jupyter Notebook 2 Updated Feb 28, 2025

(ICLR 2025) TabM: Advancing Tabular Deep Learning With Parameter-Efficient Ensembling

Python 414 35 Updated Jun 27, 2025

LSNet: See Large, Focus Small [CVPR 2025]

Python 177 8 Updated Apr 1, 2025
Next