Llama-2 7b Sentiment-FineTuned
A fine-tuned Llama 2 7B model for multiclass sentiment analysis (positive, neutral, negative) of news headlines.
Model Description
This model is a fine-tuned version of Meta's Llama-2-7B-hf using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters. The model has been specifically trained to classify sentiment in news headlines as positive, neutral, or negative. It uses 4-bit quantization for efficient inference and training.
- Developed by: Harsh Shinde
- Model type: Causal Language Model (Fine-tuned for Sentiment Analysis)
- Language(s): English
- License: Llama 2 Community License
- Finetuned from model: meta-llama/Llama-2-7b-hf
Use
This model is designed for sentiment analysis of news headlines and similar short-form text. It can classify text into three categories:
- Positive: Optimistic, favorable sentiment
- Neutral: Objective, factual sentiment
- Negative: Pessimistic, unfavorable sentiment
Ideal use cases include:
- News sentiment monitoring
- Social media sentiment analysis
- Market sentiment analysis from headlines
- Content categorization systems
Training Hyperparameters
LoRA Configuration:
- LoRA rank (r): 64
- LoRA alpha: 16
- LoRA dropout: 0.1
- Target modules: All linear layers (via PEFT auto-detection)
- Bias: none
- Task type: CAUSAL_LM
Training Arguments:
- Number of epochs: 3
- Per-device train batch size: 1
- Gradient accumulation steps: 8
- Effective batch size: 8
- Optimizer: paged_adamw_32bit
- Learning rate: 2e-4
- Weight decay: 0.001
- Learning rate scheduler: cosine
- Warmup ratio: 0.03
- Max gradient norm: 0.3
- Training precision: bf16 (bfloat16)
- Evaluation strategy: epoch
- Logging steps: 25
- Group by length: True
Quantization:
- 4-bit quantization using BitsAndBytes
- Quantization type: nf4 (NormalFloat4)
- Compute dtype: float16
- Double quantization: False
Results
The fine-tuned model achieves the following performance on the test set (900 samples):
Overall Performance:
- Accuracy: 67.89%
- F1-Score (macro): 67.62%
- Precision (weighted): 67.55%
- Recall (weighted): 67.89%
Per-Class Performance:
| Sentiment | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Negative | 0.70 | 0.78 | 0.74 | 300 |
| Neutral | 0.57 | 0.52 | 0.54 | 300 |
| Positive | 0.75 | 0.74 | 0.75 | 300 |
Key Observations:
- Strongest performance on positive sentiment (F1: 0.75) and negative sentiment (F1: 0.74)
- Neutral sentiment is more challenging (F1: 0.54), which is common in sentiment analysis tasks
- Balanced performance with consistent precision-recall trade-offs across classes
Detailed predictions available in test_predictions.csv
Summary
The model successfully learns to classify news headline sentiments with high accuracy. The LoRA fine-tuning approach enables efficient adaptation of Llama 2 7B for this specific task while maintaining model quality and requiring minimal computational resources.
Compute Infrastructure
Hardware
- GPU: NVIDIA Tesla P100 or T4 (Kaggle environment)
- Memory: 16GB GPU RAM
- Quantization: 4-bit (NF4) to fit in memory
Software
- Framework: PyTorch
- Libraries:
transformers- Hugging Face Transformerspeft- Parameter-Efficient Fine-Tuningtrl- Transformer Reinforcement Learning (SFTTrainer)bitsandbytes- 4-bit quantizationdatasets- Dataset loadingwandb- Experiment tracking
- Python Version: 3.10+
- CUDA: Compatible with PyTorch CUDA support
Model tree for harshinde/Llama-2-7b-sentiment-finetuned
Base model
meta-llama/Llama-2-7b-hfDataset used to train harshinde/Llama-2-7b-sentiment-finetuned
Evaluation results
- Accuracy on multiclass-sentiment-analysis-datasettest set self-reported0.679
- F1 Score (macro) on multiclass-sentiment-analysis-datasettest set self-reported0.676
- Precision (weighted) on multiclass-sentiment-analysis-datasettest set self-reported0.675
- Recall (weighted) on multiclass-sentiment-analysis-datasettest set self-reported0.679