Text Generation
Transformers
Safetensors
English
qwen2
text-generation-inference
unsloth
trl
sft
conversational
Instructions to use neph1/Qwen2.5-Coder-7B-Instruct-Unity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use neph1/Qwen2.5-Coder-7B-Instruct-Unity with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="neph1/Qwen2.5-Coder-7B-Instruct-Unity") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("neph1/Qwen2.5-Coder-7B-Instruct-Unity") model = AutoModelForCausalLM.from_pretrained("neph1/Qwen2.5-Coder-7B-Instruct-Unity") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use neph1/Qwen2.5-Coder-7B-Instruct-Unity with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "neph1/Qwen2.5-Coder-7B-Instruct-Unity" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neph1/Qwen2.5-Coder-7B-Instruct-Unity", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/neph1/Qwen2.5-Coder-7B-Instruct-Unity
- SGLang
How to use neph1/Qwen2.5-Coder-7B-Instruct-Unity with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "neph1/Qwen2.5-Coder-7B-Instruct-Unity" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neph1/Qwen2.5-Coder-7B-Instruct-Unity", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "neph1/Qwen2.5-Coder-7B-Instruct-Unity" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neph1/Qwen2.5-Coder-7B-Instruct-Unity", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use neph1/Qwen2.5-Coder-7B-Instruct-Unity with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for neph1/Qwen2.5-Coder-7B-Instruct-Unity to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for neph1/Qwen2.5-Coder-7B-Instruct-Unity to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for neph1/Qwen2.5-Coder-7B-Instruct-Unity to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="neph1/Qwen2.5-Coder-7B-Instruct-Unity", max_seq_length=2048, ) - Docker Model Runner
How to use neph1/Qwen2.5-Coder-7B-Instruct-Unity with Docker Model Runner:
docker model run hf.co/neph1/Qwen2.5-Coder-7B-Instruct-Unity
| base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit | |
| datasets: | |
| - Hypersniper/unity_api_2022_3 | |
| - ibranze/codellama_unity3d_v2 | |
| - neph1/Unity_Code_QnA | |
| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - unsloth | |
| - qwen2 | |
| - trl | |
| - sft | |
| # Description | |
| Qwen2.5-Coder-7B-Instruct trained on a merged dataset of Unity3d q&a from these three datasets: | |
| [ibranze/codellama_unity3d_v2](https://huggingface.co/datasets/ibranze/codellama_unity3d_v2) (Full) | |
| [Hypersniper/unity_api_2022_3](https://huggingface.co/datasets/Hypersniper/unity_api_2022_3) (10%) | |
| [neph1/Unity_Code_QnA](https://huggingface.co/datasets/neph1/Unity_Code_QnA) (Full) | |
| preview 2: | |
| 26210 rows, of which ca 1000 are from my own multi response dataset | |
| preview 1: | |
| 15062 rows in total with a 10% validation split. | |
| Trained with native chat template (minus tools usage, see this issue: https://github.com/unslothai/unsloth/issues/1053). With a little superficial testing done, it seems to respond well to the mistral template. | |
| Consider this a preview while I develop a dataset of my own. | |
| If you have any feedback, please share. I've only done some basic testing so far. I'm especially interested if you're using it with Tabby or a similar coding tool. | |
| # Uploaded model | |
| - **Developed by:** neph1 | |
| - **License:** apache-2.0 | |
| - **Finetuned from model :** unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit | |
| This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. | |
| [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) | |
| # Training details | |
| About 1.5 epochs. It's probably a bit overfitting and I should introduce some general coding questions to my validation set to ensure it doesn't lose too much general performance. | |
| Rank: 128 | |
| Alpha: 256 | |
| TrainingArguments( | |
| per_device_train_batch_size =2, | |
| gradient_accumulation_steps = 64, | |
| #max_steps=10, | |
| num_train_epochs=3, | |
| warmup_steps = 5, | |
| learning_rate = 1e-4, | |
| fp16 = not torch.cuda.is_bf16_supported(), | |
| bf16 = torch.cuda.is_bf16_supported(), | |
| logging_steps = 10, | |
| optim = "adamw_8bit", | |
| weight_decay = 0.01, | |
| lr_scheduler_type = "linear", | |
| seed = 3407, | |
| per_device_eval_batch_size = 2, | |
| eval_strategy="steps", | |
| eval_accumulation_steps = 64, | |
| eval_steps = 10, | |
| eval_delay = 0, | |
| save_strategy="steps", | |
| save_steps=25, | |
| report_to="none", | |
| ), | |
| Step Training Loss Validation Loss | |
| 20 2.043000 1.197104 | |
| 40 1.087300 0.933553 | |
| 60 0.942200 0.890801 | |
| 80 0.865600 0.866198 | |
| 100 0.851400 0.849733 | |
| 120 0.812900 0.837039 | |
| 140 0.812400 0.827064 | |
| 160 0.817300 0.818410 | |
| 180 0.802600 0.810163 | |
| 200 0.788600 0.803399 |