Open
Description
Really enjoyed your clear explanation of weight quantization 🥰
But I have a question about the calculation comparison of calculate_perplexity
.
In the article, calculates perplexity using each model's own generated output:
ppl = calculate_perplexity(model, original_text) # Model evaluates its OWN output
ppl_abs = calculate_perplexity(model_abs, absmax_text) # Quantized model evaluates its OWN output
ppl_zp = calculate_perplexity(model_zp, absmax_text) # Zero-point model evaluates ANOTHER model's output
For more comparable results, should we instead evaluate all models on:
- The same input prompt ("I have a dream"), or
- A standard validation dataset?
e.g.:
reference_text = "I have a dream" # or the other validation input
ppl_orig = calculate_perplexity(model, reference_text)
ppl_abs = calculate_perplexity(model_abs, reference_text)
ppl_zp = calculate_perplexity(model_zp, reference_text)
Metadata
Metadata
Assignees
Labels
No labels