Instructions to use Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto")
model = AutoModelForMultimodalLM.from_pretrained("Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto

SGLang

How to use Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto with Docker Model Runner:
```
docker model run hf.co/Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto
```

mPLUG UI-S1-7B - Hybrid W4 Quantized (Quanto)

Model Description

This is a hybrid quantized version of mPLUG/UI-S1-7B optimized for efficient GUI automation on consumer hardware.

Quantization Strategy

Method: Quanto INT4 hybrid quantization
Text Layers: 196 layers quantized to INT4 (75% compression)
Vision Tower: 162 layers preserved in BF16 (100% quality)
Size: 4.6GB (68.7% smaller than 14.5GB original)
VRAM: ~4.5-5.5GB (fits on 16GB GPUs with 16k context)

Key Features

✅ Zero Vision Quality Loss - Vision tower completely preserved in BF16
✅ Massive Memory Savings - 68.7% size reduction
✅ Consumer Hardware Ready - Runs on 16GB VRAM GPUs
✅ 16k Context Support - Full context window with room to spare

Performance

Metric	Original	Quantized
Model Size	14.5 GB	4.6 GB
VRAM Usage	~14-15 GB	~4.5-5.5 GB
Vision Quality	100%	100% (preserved)
Text Layers	FP16	INT4

Usage

Loading the Model

import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from quanto import safe_load, quantize, freeze, qint4

# Load base architecture
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load quantized weights
state_dict = safe_load("quanto_model.safetensors")
model.load_state_dict(state_dict, strict=False)

# Requantize (restore quanto layers)
vision_keywords = ['visual', 'vision', 'image', 'patch', 'merger', 'projector', 'embed_tokens', 'lm_head']
exclude_modules = []
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        if any(k in name.lower() for k in vision_keywords):
            exclude_modules.append(name)

quantize(model, weights=qint4, exclude=exclude_modules)
freeze(model)
model.eval()

processor = AutoProcessor.from_pretrained("Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto", trust_remote_code=True)

Inference with Images

from PIL import Image
from qwen_vl_utils import process_vision_info

# Load image
image = Image.open("screenshot.png")

# Prepare messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe the UI elements."}
        ]
    }
]

# Process
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

# Generate
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=128)
    
response = processor.batch_decode(
    [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
    skip_special_tokens=True
)[0]

Hardware Requirements

Minimum

GPU: 16GB VRAM (RTX A4000, L4, RTX 4060 Ti)
RAM: 16GB+
Storage: 5GB

Technical Details

Layer Distribution

Total Linear Layers: 358
Quantized (Text): 196 layers → INT4
Preserved (Vision): 162 layers → BF16

Quantization Process

Load model in BF16
Identify vision-critical layers
Apply Quanto INT4 to text layers only
Preserve vision tower in full precision
Save with safetensors

Limitations

Requires quanto library for loading
Best performance with vLLM deployment
Vision layers must remain unquantized for quality

Citation

Original Model:

@article{lu2025ui,
  title={UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning},
  author={Lu, Zhengxi and others},
  journal={arXiv preprint arXiv:2509.11543},
  year={2025}
}

License

Apache 2.0 (same as base model)

Acknowledgements

Base Model: mPLUG/UI-S1-7B
Quantization: Quanto by Hugging Face
Strategy: Custom hybrid quantization for VLM quality preservation

Downloads last month: 1

Model tree for Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto

Base model

mPLUG/UI-S1-7B

Finetuned

(1)

this model

Paper for Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Paper • 2509.11543 • Published Sep 15, 2025 • 50

Hadidiz9
/

UI-S1-7B-Hybrid-W4-Quanto