English-Scanned-Document-OCR-LFM2.5-VL-450M-Experimental
This repository contains a merged checkpoint fine-tuned from LiquidAI/LFM2.5-VL-450M for English scanned and printed document OCR experiments.
The model was trained as a LoRA adaptation and then merged into the base weights before upload so the repository can be used as a single full checkpoint.
The run was trained primarily for full-page transcription on synthetic and synthetic-style document data, with emphasis on scanned pages, dense text, reading order, and degraded print-like inputs.
This model is being published as an experiment record, not as a production OCR release.
What It Was Trained For
- English scanned and printed document OCR
- Full-page transcription
- Reading-order preservation
- Dense text and long-page OCR
- Some degraded and noisy document cases
What It Is Not
- Not a general-purpose OCR model
- Not validated as a robust out-of-domain document parser
- Not reliable for broad real-world generalization
- Not intended for handwriting, multilingual OCR, or production deployment without further validation
Training Summary
This merged checkpoint comes from a LoRA fine-tune of LiquidAI/LFM2.5-VL-450M.
Internal held-out evaluation showed improvements on some in-domain slices, especially long and noisy scanned-page transcription, but the gains did not scale consistently across broader distributions and did not demonstrate strong generalization.
Evaluation Notes
Results reported for this experiment should be interpreted as in-domain and internal evaluation only.
Key limitations:
- performance is sensitive to dataset distribution
- generalization outside the training-style data is limited
- behavior can be unstable across layout types
- benchmark transfer does not support claiming broad OCR capability
Intended Use
This repository is intended for:
- experiment tracking
- reproducibility
- merged checkpoint reuse for further OCR research on top of LFM2.5-VL-450M
It is not intended as a general end-user OCR checkpoint.
License
The upstream base model is released under the LFM Open License v1.0. Use of this merged checkpoint must comply with the base model license and its terms.
- Downloads last month
- 20
Model tree for loay/English-Scanned-Document-OCR-LFM2.5-VL-450M-Experimental
Base model
LiquidAI/LFM2.5-350M-Base