---
license: apache-2.0
tags:
- blip
- image-captioning
- vision-language
- transformers
- fine-tuned
- pytorch
language:
- en
base_model:
- Salesforce/blip-image-captioning-base
library_name: transformers
pipeline_tag: image-to-text
---


# BLIP model fine-tuned on histopathology images

This model is a fine-tuned version of [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) on a histopathology image dataset with the average loss of 0.0098

## Model description

The model was fine-tuned on the [histopathology-image-caption-dataset](https://www.kaggle.com/datasets/sushilyadav1998/histopathology-image-caption-dataset) for automatic captioning of histopathology images.

## Training procedure

The model was trained for 10 epochs with a batch size of 4 and a learning rate of 5e-5. Images were processed using the BLIP processor and gradient accumulation steps of 2 were used.

## Usage for further fine-tuning

The last checkpoint is included in this repository under the 'last_checkpoint' directory. You can use this checkpoint to continue fine-tuning on another dataset.

## Training details

- Dataset: Histopathology Image Caption Dataset (Kaggle)
- Base model: Salesforce/blip-image-captioning-base
- Training epochs: 10
- Batch size: 4
- Learning rate: 5e-5
- Gradient accumulation steps: 2
- Device: CUDA (if available)

  
## Usage for inference

```python
from transformers import AutoProcessor, BlipForConditionalGeneration
from PIL import Image

# Load model and processor
model = BlipForConditionalGeneration.from_pretrained("ragunath-ravi/blip-histopathology-finetuned")
processor = AutoProcessor.from_pretrained("ragunath-ravi/blip-histopathology-finetuned")

# Load image
image = Image.open("path_to_histopathology_image.jpg").convert('RGB')

# Process image
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs.pixel_values

# Generate caption
generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_caption)