XLLM

Toolkit work in progress for Large Language Model (LLM) profiling and inference acceleration.

RoadMap

llama.cpp profiling
GPU profiling metrics

LLAMA.CPP Profiling

Clone the llama.cpp. We tested with llama.cpp version b1752, you may checkout the same version via git checkout b1752 Copy files in xllm profiler to your local clone of llama.cpp. Change the CUDA to point to your local install in Makefile. Compile and install the libkineto dependency. Please feel free to use our customized kineto if you are running on TAMU HPRC. Update the Makefile.

To start profiling, simply run kineto_profiler under llama.cpp/profiler, with the same argument as running ./main in llama.cpp.

Trace Analysis

We leverage the Holistic Trace Analysis (HTA) to provide insights on llama.cpp on LLM inference. We provide jupyter notebooks for initial tracing result analyses to play with.

GPU profiling Metrics

Based on the CUDA API and CUPTI API, we can collect available GPU profiling metrics with tensor_usage_collector.

After running make under xllm/profiling/profiler (Please update the Makefile to your desired CUDA version matching the CUDA displayed in nvidia-smi, especially when you have multiple CUDA installations), run tensor_usage_collector . All available metrics will be collected into tensor_usage_results.csv in the same directory.

Results can be analyzed with jupyter notebooks under this folder.

Models

Mixtral-8x7B-Instruct: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
cuda		cuda
cupti_profiling		cupti_profiling
kineto		kineto
lightning		lightning
mamba		mamba
perplexity		perplexity
pretraining		pretraining
profiling		profiling
quantization		quantization
quantizationAndEval		quantizationAndEval
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

XLLM

RoadMap

LLAMA.CPP Profiling

Trace Analysis

GPU profiling Metrics

Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

o2lab/xllm

Folders and files

Latest commit

History

Repository files navigation

XLLM

RoadMap

LLAMA.CPP Profiling

Trace Analysis

GPU profiling Metrics

Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages