-
Screenshot to HTML
⚡921Convert screenshot to HTML code and preview
-
HuggingFaceM4/VLM_WebSight_finetuned
Text Generation • 8B • Updated • 107 • 192 -
LoRA the Explorer SDXL
🔎1.18kExplore fun LoRAs and generate with SDXL
-
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 101
Collections
Discover the best community collections!
Collections including paper arxiv:2507.22827
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 14 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
Screenshot to HTML
⚡921Convert screenshot to HTML code and preview
-
HuggingFaceM4/VLM_WebSight_finetuned
Text Generation • 8B • Updated • 107 • 192 -
LoRA the Explorer SDXL
🔎1.18kExplore fun LoRAs and generate with SDXL
-
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 101
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 14 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7