Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper β’ 2405.19332 β’ Published β’ 22
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:# Run inference directly in the terminal:
llama-cli -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:# Run inference directly in the terminal:
./llama-cli -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:# Run inference directly in the terminal:
./build/bin/llama-cli -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:docker model run hf.co/RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quantization made by Richard Erkhov.
SELM-Llama-3-8B-Instruct-iter-1 - GGUF
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
| AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
|---|---|---|
| SELM-Llama-3-8B-Instruct-iter-3 | β β ββ 33.47 | β β β 8.29 |
| SELM-Llama-3-8B-Instruct-iter-2 | β β ββ 35.65 | β β β 8.09 |
| SELM-Llama-3-8B-Instruct-iter-1 | β β ββ 32.02 | β β β 7.92 |
| Meta-Llama-3-8B-Instruct | β β ββ 24.31 | β β β 7.93 |
The following hyperparameters were used during training:
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf:# Run inference directly in the terminal: llama-cli -hf RichardErkhov/ZhangShenao_-_SELM-Llama-3-8B-Instruct-iter-1-gguf: