Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Restore n_tensor check.
  • Loading branch information
QingtaoLi1 committed Nov 5, 2024
commit f64c7680550ebf4dd0453013524cc1054b311d17
6 changes: 6 additions & 0 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,12 @@ def prepare_tensors(self):
break

for new_name, data_torch in (self._modify_tensors(data_torch, name, bid)):
# Some GPTQ models have empty bias tensors which are not in the model architecture.
# These tensors will cause tensor number check to fail, so we have to skip them.
if new_name.endswith(".bias") and np.all(LazyTorchTensor.to_eager(data_torch).numpy() == 0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if new_name.endswith(".bias") and np.all(LazyTorchTensor.to_eager(data_torch).numpy() == 0):
if new_name.endswith(".bias") and torch.all(data_torch == 0):

Lazy tensors should automatically become eager whenever an operation returns something else than a tensor (here, a bool from torch.all)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade Actually, this is a patch for some models we found. The structures between the model file and its arch don't match. If I remember correctly, it's some Qwen2 models.

logger.info(f"Skipping empty bias tensor: {new_name}")
continue

data = data_torch.squeeze().numpy()

# if data ends up empty, it means data_torch was a scalar tensor -> restore
Expand Down
4 changes: 1 addition & 3 deletions src/llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4783,9 +4783,7 @@ struct llama_model_loader {

void done_getting_tensors() const {
if (n_created != n_tensors) {
// Zero bias in some HuggingFace models will cause n_tensors mismatch
// Consider removing zero bias in convert_hf_to_gguf.py?
// throw std::runtime_error(format("%s: wrong number of tensors; expected %d, got %d", __func__, n_tensors, n_created));
throw std::runtime_error(format("%s: wrong number of tensors; expected %d, got %d", __func__, n_tensors, n_created));
}
}

Expand Down