Model Card for mistral-7B-ClinicalSum-MTS_8bit_Adapter

This model is a fine-tuned version of mistralai/Mistral-7B-v0.3. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "Dr: Good morning! What brings you in to see me today?
Patient: Good morning, Doctor. I've been having a really sore throat and a bit of a cough for the past few days.
Dr: I'm sorry to hear you're not feeling well. When exactly did these symptoms start?
Patient: The sore throat started on Tuesday evening, and the cough kicked in around Wednesday morning.
Dr: I see. Have you been experiencing any fever, body aches, or chills?
Patient: No chills or major body aches, but I did feel a bit warm last night. I took my temperature at home, and it was 100.2°F.
Dr: Okay, a very mild fever. Are you having any trouble breathing or chest pain when you cough?
Patient: No, nothing like that. It's mostly just a dry, scratchy cough that makes my throat hurt more."

generator = pipeline("text-generation", model="Saib/mistral-7B-ClinicalSum-MTS_8bit_Adapter", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=256, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with SFT.

Framework versions

  • TRL: 0.25.0
  • Transformers: 4.57.1
  • Pytorch: 2.8.0+cu126
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citations

Cite TRL as:

@Article{info:doi/10.2196/82545,
author="Ahmed, Saib
and Yousuf Sadeque, Farig",
title="Clinical Note Generation From Doctor-Patient Conversations Using Parameter-Efficient Fine-Tuning Large Language Models: Comparative Study",
journal="JMIR Med Inform",
year="2026",
month="Jun",
day="3",
volume="14",
pages="e82545",
keywords="natural language processing; clinical natural language processing; clinical NLP; Dialogue2Note; transformer; decoder-only; Mistral; Llama; Meditron; summarization; Recall-Oriented Understudy for Gisting Evaluation; Recall-Oriented Understudy for Gisting Evaluation score; ROUGE score; bidirectional encoder representations from transformers; bidirectional encoder representations from transformers score; BERTScore",
abstract="Background: Clinical note documentation is a vital yet time-intensive task in health care. While advancements in natural language processing have transformed many domains, generating accurate summaries of doctor-patient conversations remains underexplored due to the limited availability of open-source datasets. Large language models (LLMs), with their training on vast datasets, present a promising solution to this challenge. Objective: Precision in clinical summarization is crucial, as it directly impacts patient care and safety. This study aimed to evaluate the effectiveness of parameter-efficient, fine-tuned, decoder-only LLMs for clinical note generation from doctor-patient conversations. We focus on assessing medical accuracy, robustness, and the feasibility of parameter-efficient fine-tuning (PEFT) approaches under practical resource constraints. Methods: We used the Medical Training Summarization Dialog dataset containing 1700 doctor-patient conversations paired with clinical notes. Several decoder-only LLMs, including Mistral, Meditron, and Llama, were fine-tuned using PEFT techniques to reduce computational and memory overhead. Evaluation was performed using standard automatic metrics, including the Recall-Oriented Understudy for Gisting Evaluation score and bidirectional encoder representations from transformers score, to assess content overlap and semantic similarity between generated and reference clinical notes. In addition, an expert physician assessed the LLM-generated notes for medical accuracy, completeness, concision, relevance, and clinical coherence and readability. Results: Model performance was evaluated using the Recall-Oriented Understudy for Gisting Evaluation score and bidirectional encoder representations from transformers scores, demonstrating that Meditron-7B and Llama3-8B achieved state-of-the-art results among open-source, parameter-efficient, fine-tuned models, with Mistral-7B also performing competitively. The findings indicate that decoder-only LLMs, particularly Llama variants, outperform traditional models. Moreover, fine-tuning with higher quantization has the potential to further enhance performance. Human expert evaluation further indicated that Llama3-8B and Mistral-7B produced clinically coherent and accurate summaries, with Meditron-7B and Llama3-3B also performing reliably across evaluation criteria. The findings suggest that higher quantization during fine-tuning may improve efficiency without substantially compromising performance. Conclusions: This study underscores the potential of the PEFT of decoder-only LLMs to transform clinical workflows by streamlining medical documentation, thereby enabling health care professionals to dedicate more time to patient care. These models offer a scalable and resource-efficient alternative to traditional architectures and have the potential to streamline clinical documentation workflows. ",
issn="2291-9694",
doi="10.2196/82545",
url="https://medinform.jmir.org/2026/1/e82545",
url="https://doi.org/10.2196/82545"
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Saib/mistral-7B-ClinicalSum-MTS_8bit_Adapter

Finetuned
(347)
this model