-
Alibaba Cloud
- Hangzhou, China
- https://gujiaqivadin.github.io/
- https://orcid.org/0000-0002-4644-6046
Starred repositories
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"
Geologic models from Llama 4 language model + GemPy!
Ola: Pushing the Frontiers of Omni-Modal Language Model
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Collections of Papers and Projects for Multimodal Reasoning.
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
Explore the Multimodal “Aha Moment” on 2B Model
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Chrome / Edge extension to turn arXiv papers into Markdown codes in one click.
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
Fully open reproduction of DeepSeek-R1
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
Google AI Studio Starter Apps
This repository contains the code for the paper [HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting](https://gujiaqivadin.github.io/hybridgs/).
Anonymous Github is a proxy server to support anonymous browsing of Github repositories for open-science code and data.
lilygoli / SpotLessSplats
Forked from nerfstudio-project/gsplatCode for SpotLessSplats: Ignoring Distractors in 3D Gaussian Splatting built on gsplat codebase.
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
MoonPalace(月宫)是由 Moonshot AI 月之暗面提供的 API 调试工具。
A Unified Toolkit for Deep Learning-Based Table Extraction
A High-efficiency Open-source Toolkit for Table-to-Latex Task