mdabis
/

qwen35-gui-grounding_v2

Image-Text-to-Text

Model card Files Files and versions

qwen35-gui-grounding_v2 / README.md

mdabis's picture

Upload README.md with huggingface_hub

af7cd69 verified 3 months ago

|

history blame contribute delete

1.78 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3.5-4B
	tags:
	- gui-grounding
	- lora
	- qwen3.5
	- screenspot
	datasets:
	- showlab/ShowUI-desktop
	- zonghanHZH/UGround-V1-8k
	- zonghanHZH/AMEX-8k
	pipeline_tag: image-text-to-text
	---

	# Qwen3.5-4B GUI Grounding — v2 (SFT LoRA)

	LoRA adapter for Qwen3.5-4B fine-tuned on GUI grounding: given a screenshot and a natural language instruction, predict the (x, y) click coordinate of the target UI element.

	## Results — ScreenSpot-V2

	\| Split \| Correct \| Total \| Accuracy \|
	\|-------\|---------\|-------\|----------\|
	\| Desktop \| 320 \| 334 \| 95.8% \|
	\| Mobile \| 474 \| 501 \| 94.6% \|
	\| Web \| 394 \| 437 \| 90.2% \|
	\| Overall \| 1188 \| 1272 \| 93.4% \|

	## Training Data

	~23.5K samples from 3 GUI grounding datasets covering desktop, web, and mobile platforms.

	## Output Format

	```
	<\|box_start\|>(x,y)<\|box_end\|>
	```

	Coordinates are in [0, 1000] normalized space. To convert to pixel coordinates:
	```python
	pixel_x = x / 1000 * image_width
	pixel_y = y / 1000 * image_height
	```

	## Usage

	Requires `transformers>=5.2.0` and `peft`.

	```python
	from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
	from peft import PeftModel
	import torch

	base = Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16)
	model = PeftModel.from_pretrained(base, "dabism23/qwen35-gui-grounding_v2")
	processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B")
	```

	## Version History

	\| Version \| ScreenSpot-V2 \|
	\|---------\|---------------\|
	\| [v1](https://huggingface.co/dabism23/qwen35-gui-grounding) \| 92.5% \|
	\| v2 \| 93.4% \|

	## Access

	Model weights are gated. Request access to download. Training configuration details are included with the model files.