--- license: apache-2.0 tags: - blip - image-captioning - vision-language - transformers - fine-tuned - pytorch language: - en base_model: - Salesforce/blip-image-captioning-base library_name: transformers pipeline_tag: image-to-text --- # BLIP model fine-tuned on histopathology images This model is a fine-tuned version of [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) on a histopathology image dataset with the average loss of 0.0098 ## Model description The model was fine-tuned on the [histopathology-image-caption-dataset](https://www.kaggle.com/datasets/sushilyadav1998/histopathology-image-caption-dataset) for automatic captioning of histopathology images. ## Training procedure The model was trained for 10 epochs with a batch size of 4 and a learning rate of 5e-5. Images were processed using the BLIP processor and gradient accumulation steps of 2 were used. ## Usage for further fine-tuning The last checkpoint is included in this repository under the 'last_checkpoint' directory. You can use this checkpoint to continue fine-tuning on another dataset. ## Training details - Dataset: Histopathology Image Caption Dataset (Kaggle) - Base model: Salesforce/blip-image-captioning-base - Training epochs: 10 - Batch size: 4 - Learning rate: 5e-5 - Gradient accumulation steps: 2 - Device: CUDA (if available) ## Usage for inference ```python from transformers import AutoProcessor, BlipForConditionalGeneration from PIL import Image # Load model and processor model = BlipForConditionalGeneration.from_pretrained("ragunath-ravi/blip-histopathology-finetuned") processor = AutoProcessor.from_pretrained("ragunath-ravi/blip-histopathology-finetuned") # Load image image = Image.open("path_to_histopathology_image.jpg").convert('RGB') # Process image inputs = processor(images=image, return_tensors="pt") pixel_values = inputs.pixel_values # Generate caption generated_ids = model.generate(pixel_values=pixel_values, max_length=50) generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_caption)