About the EXL3 mention

by turboderp - opened 12 days ago

It's not really an apples-to-apples comparison to list perplexities from two different tests in the same table this way. Perplexity is a very sparse test to begin with, but LCPP scores every token with a warm context which is a completely different methodolog (excludes the higher-ppl tokens in the first half of each chunk.)

I wrote up a quick test which should copy the exact tokenization, striding and eval logic from lcpp's perplexity eval. Usage would be, from the repo base directory:

python eval/ppl.py -m /path/to/exl3_model --gguf --ctx-size 2048

The result I get here for the 3bpw model is 6.8840 +/- 0.04628. It's worth pointing out that even though the file size is ~13 GB, this is only because EXL3 doesn't quantize embeddings. The quantized weights are 3.01 bits including overhead from scales and so on, and the output layer is 6.00 bits. Embeddings account for about 2.4 GB of the total file size (dim 5120, vocab size 248320, FP16.)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment