Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

GPTQ quantization #78

Open
Open
@philpax

Description

@philpax

The GGML quantization strategy works, but results in a measurable loss in quality. To address this, upstream is investigating the use of the GPTQ algorithm, which quantizes in such a way to reduce the loss: ggml-org/llama.cpp#9

It's possible that this already works if you test it with a GPTQ model and load it in as q4_1, from ggml-org/llama.cpp#9 (comment).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions