Instructions to use ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential")
model = AutoModelForMultimodalLM.from_pretrained("ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential

SGLang

How to use ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential with Docker Model Runner:
```
docker model run hf.co/ldilov/stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential
```

YAML Metadata Warning:The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Model Card: stablelm-tuned-alpha-7b-4bit-128g

Description

The stablelm-tuned-alpha-7b-4bit-128g model is a quantized version of the stablelm-tuned-alpha-7b language model. It is based on the GPTNeoX architecture and has been optimized using the AutoGPTQ framework. The model has been specifically trained and fine-tuned for generating conversational responses.

The quantization process of this model reduces the memory footprint and improves inference efficiency while maintaining a high level of performance. It uses 4-bit quantization with a group size of 128, enabling efficient representation of model parameters. The dampening factor (damp_percent) is set to 0.01, which controls the quantization error.

Model Details

Model Name: stablelm-tuned-alpha-7b-4bit-128g
Base Model: stablelm-tuned-alpha-7b
Quantization Configuration:
- Bits: 4
- Group Size: 128
- Damp Percent: 0.01
- Descending Activation Quantization (desc_act): Enabled
- Symmetric Quantization (sym): Enabled
- True Sequential Quantization (true_sequential): Enabled

Usage

The stablelm-tuned-alpha-7b-4bit-128g model can be used for a variety of conversational tasks such as chatbots, question answering systems, and dialogue generation. It can generate human-like responses based on given system prompts, contexts, and input texts.

To use the model, provide a system prompt, context, and input text in the following format:

Input: {system_prompt}\n{context}: <|USER|>{text}<|ASSISTANT|>

Label: {response}

Example:

system_prompt = """# StableLM Tuned (Alpha version)
- StableLM is a helpful and chatty open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
"""

context = "It's not right to think black people deserve to be hit"
text = "You're right, it isn't funny. Finding enjoyment in other people's pains isn't funny."
response = "I am glad that you agree. Joking about abusing black people can quickly get you marked as a racist."

prompt = f"{system_prompt}\n{context}: <|USER|>{text}<|ASSISTANT|>"
label = f"{response}"

Make sure to tokenize the inputs using the original tokenizer before passing them to the model. Use the official model's template for system prompt and user prompt format.

Performance

Model Size: 5GB
Inference Speed: N/A
Accuracy: N/A

Limitations and Considerations

As a language model, the stablelm-tuned-alpha-7b-4bit-128g model relies on the quality and relevance of the training data. It may generate responses that are contextually appropriate but might not always be factually accurate or suitable for all scenarios.
Quantization introduces a trade-off between model size, memory efficiency, and precision. Although the model has been optimized for performance, there might be a slight reduction in the quality of generated responses compared to the original model.
The model may not have been trained on specific domain-specific data and may not perform optimally for specialized tasks.

Acknowledgments

The stablelm-tuned-alpha-7b-4bit-128g model is developed by StabilityAI, leveraging the GPTNeoX architecture and the AutoGPTQ framework. It builds upon the research and contributions from the open-source community in the field of language modeling and conversational AI.

License

The stablelm-tuned-alpha-7b-4bit-128g model is released under the license terms specified by StabilityAI. Quantized by Lazar Dilov github Used framework created by github

Downloads last month: 11

Safetensors

Model size

2B params

Tensor type

F32

I32

F16

BOOL

ldilov
/

stablelm-tuned-alpha-7b-4bit-128g-descact-sym-true-sequential