Stars
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
[NeurIPS 2025] DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Using RASA post-training to remove positional bias from pretrained encoders like DINOv3
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
Reference PyTorch implementation and models for DINOv3
Using Diffusion Models to Segment/Reconstruct Organs from Medical Images [AAAI Most influential Paper]
A CUDA tutorial to make people learn CUDA program from 0
CUDA Python: Performance meets Productivity
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
kingbri1 / flash-attention
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples
[CVPR 2025 Oral] SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
[AAAI 2025] Official PyTorch implementation of "ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation"
[AAAI 2025] Official Implementation of "Auto-Regressive Moving Diffusion Models for Time Series Forecasting"
SAFIRE: Segment Any Forged Image REgion (AAAI 2025). Official Repo.
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Deep image alignment for UAV-Taken visible and infrared image pairs using two branched CNN pipeline and a registration block.
Codebase and pretrained models for ECCV'18 Unified Perceptual Parsing
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing


