GPTQ quantization

The GGML quantization strategy works, but results in a measurable loss in quality. To address this, upstream is investigating the use of the GPTQ algorithm, which quantizes in such a way to reduce the loss: https://github.com/ggerganov/llama.cpp/issues/9

It's possible that this already works if you test it with a GPTQ model and load it in as q4_1, from https://github.com/ggerganov/llama.cpp/issues/9#issuecomment-1481560628.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQ quantization #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPTQ quantization #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions