Skip to content
View HITerStudy's full-sized avatar

Block or report HITerStudy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Python 7,527 1,441 Updated Nov 11, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,487 58 Updated Jun 14, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 17,616 2,194 Updated Dec 25, 2024

Contexts Optical Compression

Python 20,146 1,505 Updated Oct 25, 2025

The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"

Python 198 9 Updated Oct 31, 2025

Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)

Jupyter Notebook 723 49 Updated Nov 10, 2025

This is an official code for UniConvNet on ICCV 2025

Python 32 2 Updated Aug 13, 2025

[CVPR 2024] Rewrite the Stars

Python 425 21 Updated May 7, 2024

[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).

Jupyter Notebook 471 40 Updated Oct 27, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,213 556 Updated Nov 3, 2025

Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".

Python 896 98 Updated Aug 1, 2025

[ICCV 2025] EA-ViT: Efficient Adaptation for Elastic Vision Transformer

Python 23 1 Updated Jul 28, 2025
Python 272 33 Updated Aug 3, 2025
Python 5 Updated Feb 27, 2025

💫 Models for the spaCy Natural Language Processing (NLP) library

Python 1,813 313 Updated May 27, 2025

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Python 1,203 110 Updated Apr 19, 2024

sigma-MoE layer

Python 20 2 Updated Jan 5, 2024

P^2HCT: Plug-and-Play Hierarchical C2F Transformer for Multi-Scale Feature Fusion

Python 17 Updated May 19, 2025

New generation of CLIP with fine grained discrimination capability, ICML2025

Python 448 24 Updated Oct 27, 2025

The official implementation of [CVPR 2025] "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".

Python 378 18 Updated Jun 23, 2025

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 5,990 567 Updated Feb 26, 2025

YOLOE: Real-Time Seeing Anything [ICCV 2025]

Python 1,889 180 Updated Jun 26, 2025

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Python 339 17 Updated Feb 23, 2024

RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.

Python 4,172 465 Updated Nov 5, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,247 100 Updated Oct 29, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,210 1,294 Updated Nov 10, 2025

A fork to add multimodal model training to open-r1

Python 1,416 70 Updated Feb 8, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,683 366 Updated Oct 21, 2025
Next