About the EXL3 mention

#1
by turboderp - opened

It's not really an apples-to-apples comparison to list perplexities from two different tests in the same table this way. Perplexity is a very sparse test to begin with, but LCPP scores every token with a warm context which is a completely different methodolog (excludes the higher-ppl tokens in the first half of each chunk.)

I wrote up a quick test which should copy the exact tokenization, striding and eval logic from lcpp's perplexity eval. Usage would be, from the repo base directory:

python eval/ppl.py -m /path/to/exl3_model --gguf --ctx-size 2048

The result I get here for the 3bpw model is 6.8840 +/- 0.04628. It's worth pointing out that even though the file size is ~13 GB, this is only because EXL3 doesn't quantize embeddings. The quantized weights are 3.01 bits including overhead from scales and so on, and the output layer is 6.00 bits. Embeddings account for about 2.4 GB of the total file size (dim 5120, vocab size 248320, FP16.)

Sign up or log in to comment