| |
| --- |
| language: |
| - en |
| license: mpl-2.0 |
| base_model: Qwen/Qwen3-1.7B |
| tags: |
| - lightning |
| - hermes-3 |
| - utility |
| - on-device |
| - text-generation |
| - finetune |
| datasets: |
| - NousResearch/Hermes-3-Dataset |
| pipeline_tag: text-generation |
| inference: true |
| model_creator: TitleOS |
| --- |
| |
| # ⚡ Lightning-1.7B |
|
|
| <div align="center"> |
| <img src="https://img.shields.io/badge/Model-Lightning--1.7B-blue?style=for-the-badge&logo=huggingface" alt="Model Name"> |
| <img src="https://img.shields.io/badge/Base-Qwen3--1.7B-orange?style=for-the-badge" alt="Base Model"> |
| <img src="https://img.shields.io/badge/License-MPL_2.0-brightgreen?style=for-the-badge" alt="License"> |
| </div> |
|
|
| <br> |
|
|
| **Lightning-1.7B** is a high-efficiency utility model designed for edge computing and low-latency workflows. Finetuned from the powerful **Qwen3-1.7B** base upon the rich **NousResearch Hermes-3 dataset**, Lightning serves as a bridge between raw analytic logic and creative inference. |
|
|
| While it boasts improved capabilities in logic, Q/A, and coding compared to its base, its true strength lies in its **enhanced creativity** and **utility functions**. It is engineered to be the perfect "sidecar" model—small enough to run on-device with minimal memory impact, yet smart enough to handle complex metadata generation tasks. |
|
|
| ## 🚀 Key Features |
|
|
| * **Ultra-Lightweight:** At 1.7B parameters, it runs efficiently on consumer hardware, laptops, and even mobile devices with minimal VRAM usage. |
| * **Hermes-Powered Creativity:** Leveraging the Hermes-3 dataset, Lightning moves beyond robotic responses, offering nuanced understanding for tasks that require a "human touch," such as summarizing tone or generating creative search queries. |
| * **Utility Specialist:** Specifically optimized for background tasks like tagging, title generation, and creating search inquiries from conversation context. |
| * **Low Latency:** Designed for speed, making it ideal for real-time applications where response time is critical. |
|
|
| ## 🎯 Use Cases |
|
|
| Lightning-1.7B is best utilized not as a general chatbot, but as a specialized **Analytic & Utility Engine**: |
|
|
| 1. **Conversation Auto-Titling:** accurately summarizing long context windows into punchy, relevant titles. |
| 2. **Search Query Generation:** converting user intent or conversation history into optimized search engine queries. |
| 3. **Onboard Tagging:** analyzing text streams to apply metadata tags (e.g., sentiment, topic, urgency) locally without API calls. |
| 4. **JSON Formatting:** extracting structured data from unstructured text with higher reliability than standard small models. |
|
|
| ## 💻 Quickstart |
|
|
| You can run Lightning-1.7B using the `transformers` library. |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_name = "TitleOS/Lightning-1.7B" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_name, |
| torch_dtype=torch.bfloat16, |
| device_map="auto" |
| ) |
| |
| # Example: Generating a search query from a user thought |
| prompt = """<|im_start|>system |
| You are a utility AI. Generate a specific Google search query based on the user's confused thought.<|im_end|> |
| <|im_start|>user |
| I remember there was this movie about a guy who lives in a computer but doesn't know it, and takes a red pill?<|im_end|> |
| <|im_start|>assistant |
| """ |
| |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=64, |
| temperature=0.3, |
| do_sample=True |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| # Output: "movie guy lives in computer takes red pill matrix plot" |
| ``` |
|
|
| Merged FP16 and Quantizations: |
|
|
| FP16: https://huggingface.co/TitleOS/Lightning-1.7B |
|
|
| Q4_K_M:https://huggingface.co/TitleOS/Lightning-1.7B-Q4_K_M-GGUF |
|
|
| Q8: https://huggingface.co/TitleOS/Lightning-1.7B-Q8_0-GGUF |
| |
| 📊 Performance & Benchmarks |
| |
| Lightning-1.7B punches above its weight class. By sacrificing some breadth of general world knowledge found in larger models, it focuses density on instruction following and creative interpretation. |
| |
| Logic & Coding: Slight improvement over base Qwen3-1.7B. |
| |
| Creativity & Nuance: Significant improvement due to Hermes-3 fine-tuning. |
| |
| Memory Footprint: ~3.5GB VRAM (in FP16), <2GB (in 4-bit/8-bit quant). |
| |
| 🔧 Training Details |
| |
| Base Model: Qwen3-1.7B |
| |
| Dataset: NousResearch/Hermes-3-Dataset |
| |
| Fine-tuning Approach: Lora Alpha 32/Lora R 16 focused on preserving the base model's speed while injecting the "Hermes" personality and instruction-following capabilities. |
| |
| ⚠️ Limitations |
| |
| Knowledge Cutoff: As a small model, Lightning does not possess vast encyclopedic knowledge. It is best used for processing the text given to it in the context window rather than retrieving facts. |
| |
| Complex Reasoning: While logic is improved, multi-step mathematical reasoning or complex coding challenges should be offloaded to larger models (7B+). |
| |
| 📜 License |
| |
| This model is released under the Mozilla Public License 2.0 (MPL-2.0). |
| |
| Created by TitleOS. |