HITerStudy

Follow

HITerStudy

Follow

1 follower · 3 following

Stars

PaddlePaddle / ERNIE

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Python 7,527 1,441 Updated Nov 11, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,487 58 Updated Jun 14, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 17,616 2,194 Updated Dec 25, 2024

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 20,146 1,505 Updated Oct 25, 2025

Gorilla-Lab-SCUT / PaDT

The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"

Python 198 9 Updated Oct 31, 2025

IDEA-Research / Rex-Omni

Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)

Jupyter Notebook 723 49 Updated Nov 10, 2025

ai-paperwithcode / UniConvNet

This is an official code for UniConvNet on ICCV 2025

Python 32 2 Updated Aug 13, 2025

ma-xu / Rewrite-the-Stars

[CVPR 2024] Rewrite the Stars

Python 425 21 Updated May 7, 2024

tue-mps / eomt

[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).

Jupyter Notebook 471 40 Updated Oct 27, 2025

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,213 556 Updated Nov 3, 2025

iMoonLab / yolov13

Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".

Python 896 98 Updated Aug 1, 2025

zcxcf / EA-ViT

[ICCV 2025] EA-ViT: Efficient Adaptation for Elastic Vision Transformer

Python 23 1 Updated Jul 28, 2025

RethinkFun / DeepLearning

Python 272 33 Updated Aug 3, 2025

wghr123 / MFGDiffusion

Python 5 Updated Feb 27, 2025

explosion / spacy-models

💫 Models for the spaCy Natural Language Processing (NLP) library

Python 1,813 313 Updated May 27, 2025

davidmrau / mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Python 1,203 110 Updated Apr 19, 2024

RobertCsordas / moe_layer

sigma-MoE layer

Python 20 2 Updated Jan 5, 2024

inlmouse / P2HCT

P^2HCT: Plug-and-Play Hierarchical C2F Transformer for Multi-Scale Feature Fusion

Python 17 Updated May 19, 2025

dibyaghosh / annotation_bootstrapping

Python 11 1 Updated Jun 20, 2025

360CVGroup / FG-CLIP

New generation of CLIP with fine grained discrimination capability, ICML2025

Python 448 24 Updated Oct 27, 2025

LeiyiHU / mona

The official implementation of [CVPR 2025] "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".

Python 378 18 Updated Jun 23, 2025

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 5,990 567 Updated Feb 26, 2025

THU-MIG / yoloe

YOLOE: Real-Time Seeing Anything [ICCV 2025]

Python 1,889 180 Updated Jun 26, 2025

YifanXu74 / MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Python 339 17 Updated Feb 23, 2024

roboflow / rf-detr

RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.

Python 4,172 465 Updated Nov 5, 2025

Liuziyu77 / Visual-RFT

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,247 100 Updated Oct 29, 2025

deepseek-ai / DeepSeek-R1

91,466 11,783 Updated Jun 27, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,210 1,294 Updated Nov 10, 2025

EvolvingLMMs-Lab / open-r1-multimodal

A fork to add multimodal model training to open-r1

Python 1,416 70 Updated Feb 8, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,683 366 Updated Oct 21, 2025