- Hong Kong
Stars
Youtu-Embedding is an industry-leading, general-purpose text representation model developed by Tencent Youtu Lab.
A lightweight LMM-based Document Parsing Model
[Pytorch] The repo contains the code for "FORGE: Forming Semantic Identifiers for Generative Retrieval in Industrial Datasets"
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Benchmarking Recommendation Abilities for Large Language Models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Open Agent Coding CLI, Koding with GLM, Qwen, Kimi, DeepSeek etc.(welcome to use Kode to summit PR)
RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
Toolkit for linearizing PDFs for LLM datasets/training
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
[KDD 2025] Quadratic Neural Networks for Click-through Rate Prediction
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
使用open-webui中的pipelines技术在open-webui中调用ragflow的agent实现基于知识库的智能对话,并拥有美观的界面。
Pipelines: Versatile, UI-Agnostic OpenAI-Compatible Plugin Framework
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Production-ready platform for agentic workflow development.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。
Source code for Twitter's Recommendation Algorithm
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
A privacy-first, open-source platform for knowledge management and collaboration. Download link: http://github.com/logseq/logseq/releases. roadmap: http://trello.com/b/8txSM12G/roadmap
verl: Volcano Engine Reinforcement Learning for LLMs
A decoder-only llm-based generative recommendation framework that integrates endogenous and exogenous behavioral and semantic information in a non-intrusive manner
This is the repository for ”ROMA: Recommendation-Oriented Language Model Adaptation Using Multi-Modal Multi-Domain Item Sequences“ in KDD 2025
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥



