---
license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- qwen
- grpo
- book-rarity
- reasoning
- unsloth
- trl
datasets: []
language:
- en
pipeline_tag: text-generation
---

# Qwen 1.5B Book Rarity Detector (GRPO Fine-tuned)

This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) using **GRPO (Group Relative Policy Optimization)** for improved reasoning about book rarity and marketplace value.

## 🎯 Training Objective

The model was trained to:
- Detect and correct **foreign language bias** in rarity assessments
- Provide **structured reasoning** about book value (holdings, tier, language, age)
- Make **nuanced classifications** (HIGH_INTEREST, PROMISING, LOW_INTEREST, ELIMINATE)
- Explain decisions with step-by-step analysis

## 📊 Training Results

- **Training Method:** GRPO with Unsloth
- **Base Model:** Qwen 2.5 1.5B Instruct
- **Training Data:** 1,602 book classification examples with corrected reasoning
- **Reward Improvement:** +67% (1.86 → 3.17)
- **Key Achievement:** Successfully learned to identify foreign language bias in rare book detection

## 🚀 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ambrosfitz/qwen-1.5b-book-rarity-grpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = """Analyze this book for rarity and marketplace value. Provide step-by-step reasoning.

Title: First Edition Book Title
Author: Author Name
Year: 1990
Holdings: 5 libraries
Tier: 2
Thesis: 0
Gov Doc: 0

Think through: holdings, language, document type, age, and rarity tier."""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## 📈 Training Metrics

| Metric | Start | Final | Improvement |
|--------|-------|-------|-------------|
| Reward | 1.86  | 3.17  | +67% |
| Reward Std | 0.99 | 0.53 | -46% (more stable) |
| KL Divergence | 0.001 | 0.013 | Controlled |

## 🎓 Model Capabilities

### Structured Reasoning
The model provides analysis across multiple dimensions:
1. **Holdings Analysis** - Library availability assessment
2. **Language Detection** - Identifies foreign language bias
3. **Document Type** - Recognizes theses, gov docs, etc.
4. **Age Factor** - Historical context and value
5. **Rarity Tier** - Interprets scarcity indicators

### Key Improvements Over Base Model
- ✅ **Foreign Language Detection**: Correctly identifies non-English titles and adjusts rarity assessment
- ✅ **Nuanced Classifications**: Avoids automatic HIGH_INTEREST for 0-holding foreign books
- ✅ **Explainable AI**: Provides reasoning chain for every decision
- ✅ **Consistent Output**: Lower variance in reward scores (0.99 → 0.53)

## 🔧 Training Configuration

- **LoRA r:** 16
- **LoRA alpha:** 16
- **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Learning rate:** 5e-6
- **Batch size:** 4 (effective 16 with gradient accumulation)
- **GRPO beta:** 0.1
- **Training steps:** 360
- **Quantization:** 4-bit with Unsloth optimizations

## 📚 Use Cases

- Book marketplace valuation
- Library collection assessment
- Rare book identification
- Automated book triage for resellers
- Detection of common vs. rare editions

## ⚠️ Limitations

- Trained primarily on English-language library data
- Best for books with WorldCat holdings data
- May need adjustment for specialized collections (art books, music scores, etc.)
- 256 token generation limit in training

## 📄 License

Apache 2.0 (inherits from Qwen 2.5 base model)

## 🙏 Acknowledgments

- Built with [Unsloth](https://github.com/unslothai/unsloth) for optimized training
- Uses [TRL](https://github.com/huggingface/trl) for GRPO implementation
- Based on [Qwen 2.5](https://huggingface.co/Qwen) by Alibaba Cloud

## 📧 Contact

For questions or issues, please open an issue on the model repository.