Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors

LangDC Overview

Current large video-language models face efficiency issues due to processing massive visual tokens. Existing fixed-ratio token compression ignores varying semantic density across video clips. Consequently, this lead to inadequate representation of information-rich clips due to insufficient tokens and unnecessary computation on static or content-poor ones. To address this, we propose LangDC, a Language-aware Dynamic Token Compressor. LangDC leverages a lightweight language model to describe video clips, converting them into soft caption tokens as visual representations. Trained with our proposed semantic density-aware supervision, LangDC aims to 1) cover key visual cues necessary for downstream task reasoning and 2) dynamically adjust compression ratios based on scene richness, reflected by descriptions length.

Contributions

We propose LangDC, a novel language-aware token compression strategy. Using soft language tokens for visual representation, it adaptively adjusts compression ratios, improving token utilization over fixed-ratio techniques.
We propose semantic density-aware supervision for the token compressors. By explicitly providing reconstruction targets for token compression, we enable the derivation of a more compact feature set that is not only aware of information richness but also preserves key visual cues.
Experimental results demonstrate that our method reduces FLOPs by 49% relative to the strong baseline VideoGPT+, while maintaining competitive performance. Additional qualitative results show adaptive compression based on video clip semantic density.

Installation

We recommend setting up a conda environment for the project:

conda create --name=langdc python=3.11
conda activate langdc

git clone https://github.com/NIneeeeeem/LangDC.git
cd LangDC

pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
pip install transformers==4.41.0

pip install -r requirements.txt

export PYTHONPATH="./:$PYTHONPATH"

Additionally, install FlashAttention for training,

pip install ninja

git clone https://github.com/HazyResearch/flash-attention.git
cd flash-attention
python setup.py install

Quantitative Evaluation 📊

We provide instructions to reproduce LangDC results on VideoMME, MVBench, LongVideoBench, VSIBench and four open-ended QA Benchmark. Please follow the instructions at eval/README.md.

To reproduce the results in Table 1 of the Motivation chapter, please refer to this repository.

Citations 📜:

If you're using LangDC in your research or applications, please give us a star ⭐ to support us and cite using this BibTeX:

@misc{wang2025seeing,
    title={Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors},
    author={Xiangchen Wang and Jinrui Zhang and Teng Wang and Haigang Zhang and Feng Zheng},
    year={2025},
    eprint={2509.00969},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgements

Video-ChatGPT+: A pioneering attempt in Video-based conversation models.
LLaVA: Our code base is build upon LLaVA and Video-ChatGPT+.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
asset		asset
eval		eval
langdc		langdc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors

LangDC Overview

Contributions

Installation

Quantitative Evaluation 📊

Citations 📜:

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

NIneeeeeem/LangDC

Folders and files

Latest commit

History

Repository files navigation

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors

LangDC Overview

Contributions

Installation

Quantitative Evaluation 📊

Citations 📜:

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages