Instructions to use reeducator/vicuna-13b-free with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reeducator/vicuna-13b-free with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="reeducator/vicuna-13b-free")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("reeducator/vicuna-13b-free") model = AutoModelForMultimodalLM.from_pretrained("reeducator/vicuna-13b-free") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use reeducator/vicuna-13b-free with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "reeducator/vicuna-13b-free" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reeducator/vicuna-13b-free", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/reeducator/vicuna-13b-free
- SGLang
How to use reeducator/vicuna-13b-free with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "reeducator/vicuna-13b-free" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reeducator/vicuna-13b-free", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "reeducator/vicuna-13b-free" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reeducator/vicuna-13b-free", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use reeducator/vicuna-13b-free with Docker Model Runner:
docker model run hf.co/reeducator/vicuna-13b-free
text-generation-webui: AttributeError: 'Offload_LlamaModel' object has no attribute 'preload', when trying to generate text
Traceback (most recent call last):
File "D:\oobabooga\text-generation-webui\modules\callbacks.py", line 66, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "D:\oobabooga\text-generation-webui\modules\text_generation.py", line 290, in generate_with_callback
shared.model.generate(**kwargs)
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
outputs = self.model(
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "D:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 135, in forward
if idx <= (self.preload - 1):
File "C:\Users\FuckMicrosoftPC.conda\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Offload_LlamaModel' object has no attribute 'preload'
Note: trying to load the GPTQ safetensors model
Try the ooba fork of GPTQ. Do not use the new one.
Doesn't seem to do much, unless I am tarded and using the wrong version. Using https://github.com/oobabooga/GPTQ-for-LLaMa/
That is definitely the right one. I only have this set up on linux tho.
Worked for me after I downloaded the config.json in the other folder
What folder? Where? How?
https://huggingface.co/reeducator/vicuna-13b-free/tree/main/hf-output
Also has FP16 so you can convert it to whatever you want. Like act order and no group size.
The issue for me was that I was using an outdated gptq-for-llama repo. I checked the readme and it says to delete that folder before updating.
For anyone that needs instruction:
Windows: Simply delete the GPTQ-for-LLaMa folder (located at /text-generation-webui/repositories/) then run the update_windows.bat if you used the windows version
Linux: Delete the same folder as the windows one, replace with the newest from https://github.com/oobabooga/GPTQ-for-LLaMa.git
Clone the repo to your machine in the repositories folder with:
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd to the newly cloned directory, then
python -m pip install -r requirements.txt