
-
axera
- Shenzhen,Guangdong,China
Starred repositories
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
QRCode(from WeChat) implement in ncnn⚡二维码检测&解码⚡ncnn⚡
Qwen2.5-Omni-3B on Axera
Easily train a good VC model with voice data <= 10 mins!
One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
Demo for Qwen2.5-VL-3B-Instruct on Axera device.
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Demo for InternVL3-2B on Axera device.
Multilingual Voice Understanding Model
Demo for Janus-Pro-1B on Axera device.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
MixformerV2 onnx c++, MixformerV2 TensorRT CPP and python version
Janus-Series: Unified Multimodal Understanding and Generation Models
[NeurIPS 2023] MixFormerV2: Efficient Fully Transformer Tracking
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
YOLOv12: Attention-Centric Real-Time Object Detectors
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation