--- license: apache-2.0 base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - qwen - grpo - book-rarity - reasoning - unsloth - trl datasets: [] language: - en pipeline_tag: text-generation --- # Qwen 1.5B Book Rarity Detector (GRPO Fine-tuned) This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) using **GRPO (Group Relative Policy Optimization)** for improved reasoning about book rarity and marketplace value. ## 🎯 Training Objective The model was trained to: - Detect and correct **foreign language bias** in rarity assessments - Provide **structured reasoning** about book value (holdings, tier, language, age) - Make **nuanced classifications** (HIGH_INTEREST, PROMISING, LOW_INTEREST, ELIMINATE) - Explain decisions with step-by-step analysis ## 📊 Training Results - **Training Method:** GRPO with Unsloth - **Base Model:** Qwen 2.5 1.5B Instruct - **Training Data:** 1,602 book classification examples with corrected reasoning - **Reward Improvement:** +67% (1.86 → 3.17) - **Key Achievement:** Successfully learned to identify foreign language bias in rare book detection ## 🚀 Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "ambrosfitz/qwen-1.5b-book-rarity-grpo" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = """Analyze this book for rarity and marketplace value. Provide step-by-step reasoning. Title: First Edition Book Title Author: Author Name Year: 1990 Holdings: 5 libraries Tier: 2 Thesis: 0 Gov Doc: 0 Think through: holdings, language, document type, age, and rarity tier.""" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## 📈 Training Metrics | Metric | Start | Final | Improvement | |--------|-------|-------|-------------| | Reward | 1.86 | 3.17 | +67% | | Reward Std | 0.99 | 0.53 | -46% (more stable) | | KL Divergence | 0.001 | 0.013 | Controlled | ## 🎓 Model Capabilities ### Structured Reasoning The model provides analysis across multiple dimensions: 1. **Holdings Analysis** - Library availability assessment 2. **Language Detection** - Identifies foreign language bias 3. **Document Type** - Recognizes theses, gov docs, etc. 4. **Age Factor** - Historical context and value 5. **Rarity Tier** - Interprets scarcity indicators ### Key Improvements Over Base Model - ✅ **Foreign Language Detection**: Correctly identifies non-English titles and adjusts rarity assessment - ✅ **Nuanced Classifications**: Avoids automatic HIGH_INTEREST for 0-holding foreign books - ✅ **Explainable AI**: Provides reasoning chain for every decision - ✅ **Consistent Output**: Lower variance in reward scores (0.99 → 0.53) ## 🔧 Training Configuration - **LoRA r:** 16 - **LoRA alpha:** 16 - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **Learning rate:** 5e-6 - **Batch size:** 4 (effective 16 with gradient accumulation) - **GRPO beta:** 0.1 - **Training steps:** 360 - **Quantization:** 4-bit with Unsloth optimizations ## 📚 Use Cases - Book marketplace valuation - Library collection assessment - Rare book identification - Automated book triage for resellers - Detection of common vs. rare editions ## ⚠️ Limitations - Trained primarily on English-language library data - Best for books with WorldCat holdings data - May need adjustment for specialized collections (art books, music scores, etc.) - 256 token generation limit in training ## 📄 License Apache 2.0 (inherits from Qwen 2.5 base model) ## 🙏 Acknowledgments - Built with [Unsloth](https://github.com/unslothai/unsloth) for optimized training - Uses [TRL](https://github.com/huggingface/trl) for GRPO implementation - Based on [Qwen 2.5](https://huggingface.co/Qwen) by Alibaba Cloud ## 📧 Contact For questions or issues, please open an issue on the model repository.