Skip to content

llama : print size and type of overridden tensors #13364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025
Merged

Conversation

slaren
Copy link
Member

@slaren slaren commented May 7, 2025

Small QoL improvement. Example output:

tensor blk.10.exp_probs_b.bias (0 MiB f32) buffer type overridden to CPU
tensor blk.10.ffn_gate_exps.weight (700 MiB iq1_s) buffer type overridden to CPU
tensor blk.10.ffn_down_exps.weight (700 MiB iq1_s) buffer type overridden to CPU
tensor blk.10.ffn_up_exps.weight (700 MiB iq1_s) buffer type overridden to CPU
tensor blk.10.ffn_gate_shexp.weight (9 MiB q5_K) buffer type overridden to CPU
tensor blk.10.ffn_down_shexp.weight (11 MiB q6_K) buffer type overridden to CPU
tensor blk.10.ffn_up_shexp.weight (9 MiB q5_K) buffer type overridden to CPU

@slaren slaren requested a review from Copilot May 7, 2025 19:18
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the debugging output for overridden tensor buffer types by including the tensor’s memory size (in MiB) and data type in the log message. It also corrects the spelling of "overridden" in the log output.

  • Enhanced debug logging for better visibility into tensor properties.
  • Fixed minor spelling error in the log messages.
Comments suppressed due to low confidence (1)

src/llama-model.cpp:1655

  • The updated debug log message improves clarity by including additional tensor details. Consider verifying that the integer division used for calculating the tensor size meets your precision requirements, or switch to floating point arithmetic if a more precise value is desired.
LLAMA_LOG_DEBUG("tensor %s (%zu MiB %s) buffer type overridden to %s\n", tensor_name.c_str(), ggml_nbytes(t_meta) / 1024 / 1024, ggml_type_name(t_meta->type), ggml_backend_buft_name(buft));

@slaren slaren merged commit f061021 into master May 8, 2025
46 checks passed
@slaren slaren deleted the sl/ot-tensor-size branch May 8, 2025 11:15
@ddh0
Copy link
Contributor

ddh0 commented May 9, 2025

yayy thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants