Skip to content
View manhph2211's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report manhph2211

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 50,427 4,276 Updated Jul 3, 2025
Python 1 Updated Jun 3, 2025

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods

Python 1,652 379 Updated Jul 3, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 2,870 218 Updated Jun 30, 2025

Awesome-LLM: a curated list of Large Language Model

24,080 2,031 Updated May 9, 2025

ConceptAttention: A method for interpreting multi-modal diffusion transformers.

Jupyter Notebook 284 12 Updated Apr 14, 2025

Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning

Python 192 7 Updated Apr 19, 2025

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,292 516 Updated May 18, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,466 2,642 Updated Jun 3, 2025

[WACV'25 Oral] Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Python 445 18 Updated Dec 16, 2024

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,728 331 Updated Jan 4, 2024

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Python 3,263 265 Updated Sep 6, 2023

SALMONN family: A suite of advanced multi-modal LLMs

1,273 101 Updated Jun 20, 2025

a text-conditional diffusion probabilistic model capable of generating high fidelity audio.

Python 166 20 Updated May 29, 2024

Official repo for Images that sound: a special spectrogram that can be seen as images and played as sound generated by diffusions

Python 242 13 Updated Feb 4, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,020 243 Updated Jul 2, 2025

Text-to-Audio/Music Generation

Python 2,457 200 Updated Sep 29, 2024

Code for BLT research paper

Python 1,720 149 Updated May 22, 2025

Cache-Augmented Generation: A Simple, Efficient Alternative to RAG

Python 1,329 192 Updated May 26, 2025

Medical o1, Towards medical complex reasoning with LLMs

Python 1,148 117 Updated Jan 20, 2025

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 783 86 Updated Apr 1, 2025

Large Concept Models: Language modeling in a sentence representation space

Python 2,239 201 Updated Jan 29, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 11,327 824 Updated May 15, 2025

GPT4-4V Histopathology In-Context-Learning

Python 24 2 Updated May 12, 2024

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.

Go 145,378 12,279 Updated Jul 2, 2025

Large Language Model Text Generation Inference

Python 10,280 1,206 Updated Jul 2, 2025

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Python 1,476 158 Updated May 10, 2025

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,497 147 Updated Jun 24, 2025

(ICLR'25) PaPaGei: Open Foundation Models for Optical Physiological Signals

Python 93 18 Updated May 31, 2025
Next