Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,40 +10,138 @@ Quantized with **EOQ (Entropy-Optimal Quantization)**: absmax Q6 + rANS entropy
|
|
| 10 |
|
| 11 |
## Benchmark (RTX PRO 6000 Blackwell 96GB VRAM)
|
| 12 |
|
| 13 |
-
| Format | Size | PPL (WikiText-2) |
|
| 14 |
-
|--------|------|------------------|
|
| 15 |
-
| FP16 | 8412 MB | 7.58 |
|
| 16 |
-
| GGUF Q4_K_M | 2709 MB | ~ref |
|
| 17 |
-
| **EOQ Q6** | **2944 MB** | **7.76** |
|
| 18 |
|
| 19 |
EOQ Q6 is **-8.7% larger** than GGUF Q4_K_M.
|
| 20 |
PPL degradation vs FP16: +0.18 points.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Inference Speed
|
| 23 |
|
| 24 |
EOQ models are stored as dequantized FP16 safetensors.
|
| 25 |
Inference speed is **identical to FP16** (no quantized kernels).
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
-
since GGUF uses optimized INT4 kernels in llama.cpp that reduce
|
| 29 |
-
memory bandwidth. EOQ advantage is **smaller file size** at
|
| 30 |
-
comparable quality, not speed.
|
| 31 |
-
|
| 32 |
-
Measured: 54.4 tok/s (same as FP16 baseline: 53.6 tok/s)
|
| 33 |
|
| 34 |
## Usage
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## What is EOQ?
|
| 39 |
|
| 40 |
EOQ combines block-wise absmax quantization with rANS entropy coding.
|
| 41 |
-
|
| 42 |
-
rANS removes this redundancy losslessly, saving 10-18%.
|
| 43 |
-
|
| 44 |
-
The result: simpler quantization (absmax) that matches complex
|
| 45 |
-
GGUF K-quants in quality-per-byte, at smaller file size.
|
| 46 |
|
| 47 |
## GitHub
|
| 48 |
|
| 49 |
-
https://github.com/caiovicentino/eoq-quantization
|
|
|
|
| 10 |
|
| 11 |
## Benchmark (RTX PRO 6000 Blackwell 96GB VRAM)
|
| 12 |
|
| 13 |
+
| Format | Size | PPL (WikiText-2) | tok/s |
|
| 14 |
+
|--------|------|------------------|-------|
|
| 15 |
+
| FP16 | 8412 MB | 7.58 | 54.0 |
|
| 16 |
+
| GGUF Q4_K_M | 2709 MB | ~ref | ~ref |
|
| 17 |
+
| **EOQ Q6** | **2944 MB** | **7.76** | **54.3** |
|
| 18 |
|
| 19 |
EOQ Q6 is **-8.7% larger** than GGUF Q4_K_M.
|
| 20 |
PPL degradation vs FP16: +0.18 points.
|
| 21 |
|
| 22 |
+
## Cross-Model Validation
|
| 23 |
+
|
| 24 |
+
| Model | FP16 PPL | EOQ Q5 Size | EOQ Q5 PPL | Delta |
|
| 25 |
+
|-------|----------|-------------|------------|-------|
|
| 26 |
+
| Qwen2.5-0.5B | 10.87 | 279 MB | 11.69 | +0.83 |
|
| 27 |
+
| Qwen2.5-3B | 6.54 | 1,724 MB | 6.77 | +0.23 |
|
| 28 |
+
| Qwen3.5-4B | 7.58 | 2,398 MB | 7.77 | +0.18 |
|
| 29 |
+
| Qwen3.5-27B | 5.65 | 15,353 MB | 5.94 | +0.31 |
|
| 30 |
+
| Qwen3.5-35B-A3B | 5.19 | 19,680 MB | 5.39 | +0.21 |
|
| 31 |
+
|
| 32 |
## Inference Speed
|
| 33 |
|
| 34 |
EOQ models are stored as dequantized FP16 safetensors.
|
| 35 |
Inference speed is **identical to FP16** (no quantized kernels).
|
| 36 |
+
EOQ advantage is **smaller file size** at comparable quality, not speed.
|
| 37 |
|
| 38 |
+
Measured: 54.3 tok/s (same as FP16: 54.0 tok/s)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
## Usage
|
| 41 |
|
| 42 |
+
Version: ImageMagick 7.1.2-13 Q16-HDRI aarch64 23522 https://imagemagick.org
|
| 43 |
+
Copyright: (C) 1999 ImageMagick Studio LLC
|
| 44 |
+
License: https://imagemagick.org/license/
|
| 45 |
+
Features: Cipher DPC HDRI Modules
|
| 46 |
+
Delegates (built-in): bzlib heic jng jpeg lcms ltdl lzma png tiff webp xml zlib zstd
|
| 47 |
+
Compiler: clang (17.0.0)
|
| 48 |
+
Usage: import [options ...] [ file ]
|
| 49 |
+
|
| 50 |
+
Image Settings:
|
| 51 |
+
-adjoin join images into a single multi-image file
|
| 52 |
+
-border include window border in the output image
|
| 53 |
+
-channel type apply option to select image channels
|
| 54 |
+
-colorspace type alternate image colorspace
|
| 55 |
+
-comment string annotate image with comment
|
| 56 |
+
-compress type type of pixel compression when writing the image
|
| 57 |
+
-define format:option
|
| 58 |
+
define one or more image format options
|
| 59 |
+
-density geometry horizontal and vertical density of the image
|
| 60 |
+
-depth value image depth
|
| 61 |
+
-descend obtain image by descending window hierarchy
|
| 62 |
+
-display server X server to contact
|
| 63 |
+
-dispose method layer disposal method
|
| 64 |
+
-dither method apply error diffusion to image
|
| 65 |
+
-delay value display the next image after pausing
|
| 66 |
+
-encipher filename convert plain pixels to cipher pixels
|
| 67 |
+
-endian type endianness (MSB or LSB) of the image
|
| 68 |
+
-encoding type text encoding type
|
| 69 |
+
-filter type use this filter when resizing an image
|
| 70 |
+
-format "string" output formatted image characteristics
|
| 71 |
+
-frame include window manager frame
|
| 72 |
+
-gravity direction which direction to gravitate towards
|
| 73 |
+
-identify identify the format and characteristics of the image
|
| 74 |
+
-interlace type None, Line, Plane, or Partition
|
| 75 |
+
-interpolate method pixel color interpolation method
|
| 76 |
+
-label string assign a label to an image
|
| 77 |
+
-limit type value Area, Disk, Map, or Memory resource limit
|
| 78 |
+
-monitor monitor progress
|
| 79 |
+
-page geometry size and location of an image canvas
|
| 80 |
+
-pause seconds seconds delay between snapshots
|
| 81 |
+
-pointsize value font point size
|
| 82 |
+
-quality value JPEG/MIFF/PNG compression level
|
| 83 |
+
-quiet suppress all warning messages
|
| 84 |
+
-regard-warnings pay attention to warning messages
|
| 85 |
+
-repage geometry size and location of an image canvas
|
| 86 |
+
-respect-parentheses settings remain in effect until parenthesis boundary
|
| 87 |
+
-sampling-factor geometry
|
| 88 |
+
horizontal and vertical sampling factor
|
| 89 |
+
-scene value image scene number
|
| 90 |
+
-screen select image from root window
|
| 91 |
+
-seed value seed a new sequence of pseudo-random numbers
|
| 92 |
+
-set property value set an image property
|
| 93 |
+
-silent operate silently, i.e. don't ring any bells
|
| 94 |
+
-snaps value number of screen snapshots
|
| 95 |
+
-support factor resize support: > 1.0 is blurry, < 1.0 is sharp
|
| 96 |
+
-synchronize synchronize image to storage device
|
| 97 |
+
-taint declare the image as modified
|
| 98 |
+
-transparent-color color
|
| 99 |
+
transparent color
|
| 100 |
+
-treedepth value color tree depth
|
| 101 |
+
-verbose print detailed information about the image
|
| 102 |
+
-virtual-pixel method
|
| 103 |
+
Constant, Edge, Mirror, or Tile
|
| 104 |
+
-window id select window with this id or name
|
| 105 |
+
root selects whole screen
|
| 106 |
+
|
| 107 |
+
Image Operators:
|
| 108 |
+
-annotate geometry text
|
| 109 |
+
annotate the image with text
|
| 110 |
+
-colors value preferred number of colors in the image
|
| 111 |
+
-crop geometry preferred size and location of the cropped image
|
| 112 |
+
-encipher filename convert plain pixels to cipher pixels
|
| 113 |
+
-extent geometry set the image size
|
| 114 |
+
-geometry geometry preferred size or location of the image
|
| 115 |
+
-help print program options
|
| 116 |
+
-monochrome transform image to black and white
|
| 117 |
+
-negate replace every pixel with its complementary color
|
| 118 |
+
-quantize colorspace reduce colors in this colorspace
|
| 119 |
+
-resize geometry resize the image
|
| 120 |
+
-rotate degrees apply Paeth rotation to the image
|
| 121 |
+
-strip strip image of all profiles and comments
|
| 122 |
+
-thumbnail geometry create a thumbnail of the image
|
| 123 |
+
-transparent color make this color transparent within the image
|
| 124 |
+
-trim trim image edges
|
| 125 |
+
-type type image type
|
| 126 |
+
|
| 127 |
+
Miscellaneous Options:
|
| 128 |
+
-debug events display copious debugging information
|
| 129 |
+
-help print program options
|
| 130 |
+
-list type print a list of supported option arguments
|
| 131 |
+
-log format format of debugging information
|
| 132 |
+
-version print version information
|
| 133 |
+
|
| 134 |
+
By default, 'file' is written in the MIFF image format. To
|
| 135 |
+
specify a particular image format, precede the filename with an image
|
| 136 |
+
format name and a colon (i.e. ps:image) or specify the image type as
|
| 137 |
+
the filename suffix (i.e. image.ps). Specify 'file' as '-' for
|
| 138 |
+
standard input or output.
|
| 139 |
|
| 140 |
## What is EOQ?
|
| 141 |
|
| 142 |
EOQ combines block-wise absmax quantization with rANS entropy coding.
|
| 143 |
+
Simple quantization that matches complex GGUF K-quants in quality-per-byte.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
## GitHub
|
| 146 |
|
| 147 |
+
https://github.com/caiovicentino/eoq-quantization
|