Instructions to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Tesslate/UIGEN-T2-7B-Q8_0-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Tesslate/UIGEN-T2-7B-Q8_0-GGUF", dtype="auto")

llama-cpp-python

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Tesslate/UIGEN-T2-7B-Q8_0-GGUF",
	filename="uigen-t2-7b-3600-q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Use Docker

docker model run hf.co/Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

LM Studio
Jan

vLLM

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tesslate/UIGEN-T2-7B-Q8_0-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tesslate/UIGEN-T2-7B-Q8_0-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

SGLang

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tesslate/UIGEN-T2-7B-Q8_0-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tesslate/UIGEN-T2-7B-Q8_0-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tesslate/UIGEN-T2-7B-Q8_0-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tesslate/UIGEN-T2-7B-Q8_0-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Ollama:
```
ollama run hf.co/Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0
```

Unsloth Studio

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Tesslate/UIGEN-T2-7B-Q8_0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Tesslate/UIGEN-T2-7B-Q8_0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Tesslate/UIGEN-T2-7B-Q8_0-GGUF to start chatting

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Docker Model Runner:
```
docker model run hf.co/Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0
```

Lemonade

How to use Tesslate/UIGEN-T2-7B-Q8_0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Tesslate/UIGEN-T2-7B-Q8_0-GGUF:Q8_0

Run and chat with the model

lemonade run user.UIGEN-T2-7B-Q8_0-GGUF-Q8_0

List all available models

lemonade list

Model Card for UIGEN-T2-7B-GGUF

OUR Training Article

Testing Github for Artifacts

Model Overview

We're excited to introduce UIGEN-T2, the next evolution in our UI generation model series. Fine-tuned from the highly capable Qwen2.5-Coder-7B-Instruct base model using PEFT/LoRA, UIGEN-T2 is specifically designed to generate HTML and Tailwind CSS code for web interfaces. What sets UIGEN-T2 apart is its training on a massive 50,000 sample dataset (up from 400) and its unique UI-based reasoning capability, allowing it to generate not just code, but code informed by thoughtful design principles.

Model Highlights

High-Quality UI Code Generation: Produces functional and semantic HTML combined with utility-first Tailwind CSS.
Massive Training Dataset: Trained on 50,000 diverse UI examples, enabling broader component understanding and stylistic range.
Innovative UI-Based Reasoning: Incorporates detailed reasoning traces generated by a specialized "teacher" model, ensuring outputs consider usability, layout, and aesthetics. (See example reasoning in description below)
PEFT/LoRA Trained (Rank 128): Efficiently fine-tuned for UI generation. We've published LoRA checkpoints at each training step for transparency and community use!
Improved Chat Interaction: Streamlined prompt flow – no more need for the awkward double think prompt! Interaction feels more natural.

Example Reasoning (Internal Guide for Generation)

Here's a glimpse into the kind of reasoning that guides UIGEN-T2 internally, generated by our specialized teacher model:

<|begin_of_thought|>
When approaching the challenge of crafting an elegant stopwatch UI, my first instinct is to dissect what truly makes such an interface delightful yet functional—hence, I consider both aesthetic appeal and usability grounded in established heuristics like Nielsen’s “aesthetic and minimalist design” alongside Gestalt principles... placing the large digital clock prominently aligns with Fitts’ Law... The glassmorphism effect here enhances visual separation... typography choices—the use of a monospace font family ("Fira Code" via Google Fonts) supports readability... iconography paired with labels inside buttons provides dual coding... Tailwind CSS v4 enables utility-driven consistency... critical reflection concerns responsiveness: flexbox layouts combined with relative sizing guarantee graceful adaptation...
<|end_of_thought|>

Example Outputs

Use Cases

Recommended Uses

Rapid UI Prototyping: Quickly generate HTML/Tailwind code snippets from descriptions or wireframes.
Component Generation: Create standard and custom UI components (buttons, cards, forms, layouts).
Frontend Development Assistance: Accelerate development by generating baseline component structures.
Design-to-Code Exploration: Bridge the gap between design concepts and initial code implementation.

Limitations

Current Framework Focus: Primarily generates HTML and Tailwind CSS. (Bootstrap support is planned!).
Complex JavaScript Logic: Focuses on structure and styling; dynamic behavior and complex state management typically require manual implementation.
Highly Specific Design Systems: May need further fine-tuning for strict adherence to unique, complex corporate design systems.

How to Use

You have to use this system prompt:

You are Tesslate, a helpful assistant specialized in UI generation.

These are the reccomended parameters: 0.7 Temp, Top P 0.9.

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Make sure you have PEFT installed: pip install peft
from peft import PeftModel

# Use your specific model name/path once uploaded
model_name_or_path = "tesslate/UIGEN-T2" # Placeholder - replace with actual HF repo name
base_model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16, # or float16 if bf16 not supported
    device_map="auto"
)

# Load the PEFT model (LoRA weights)
model = PeftModel.from_pretrained(base_model, model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Use base tokenizer

# Note the simplified prompt structure (no double 'think')
prompt = """<|im_start|>user
Create a simple card component using Tailwind CSS with an image, title, and description.<|im_end|>
<|im_start|>assistant
""" # Model will generate reasoning and code following this

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Adjust generation parameters as needed
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6, top_p=0.9)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Performance and Evaluation

Strengths:
- Generates semantically correct and well-structured HTML/Tailwind CSS.
- Leverages a large dataset (50k samples) for improved robustness and diversity.
- Incorporates design reasoning for more thoughtful UI outputs.
- Improved usability via streamlined chat template.
- Openly published LoRA checkpoints for community use.
Weaknesses:
- Currently limited to HTML/Tailwind CSS (Bootstrap planned).
- Complex JavaScript interactivity requires manual implementation.
- Reinforcement Learning refinement (for stricter adherence to principles/rewards) is a future step.

Technical Specifications

Architecture: Transformer-based LLM adapted with PEFT/LoRA
Base Model: Qwen/Qwen2.5-Coder-7B-Instruct
Adapter Rank (LoRA): 128
Training Data Size: 50,000 samples
Precision: Trained using bf16/fp16. Base model requires appropriate precision handling.
Hardware Requirements: Recommend GPU with >= 16GB VRAM for efficient inference (depends on quantization/precision).
Software Dependencies:
- Hugging Face Transformers (transformers)
- PyTorch (torch)
- Parameter-Efficient Fine-Tuning (peft)

Citation

If you use UIGEN-T2 or the LoRA checkpoints in your work, please cite us:

@misc{tesslate_UIGEN-T2,
  title={UIGEN-T2: Scaling UI Generation with Reasoning on Qwen2.5-Coder-7B},
  author={tesslate},
  year={2024}, # Adjust year if needed
  publisher={Hugging Face},
  url={https://huggingface.co/tesslate/UIGEN-T2} # Placeholder URL
}