Dataset : Custom Dataset

VCTI-RoBERTa-Fiber

Model Summary

This model is a domain-adapted RoBERTa-base model fine-tuned using Masked Language Modeling (MLM) on optical communication and photonics data. It is optimized for generating domain-specific embeddings that capture the nuances and technical jargon of the optical domain.

⚠️ Note: This is the basic version of our ongoing development.
A significantly improved version trained on much larger and more diverse optical corpora will be released soon!


Training Data

The model was trained on:

  • 1000+ Optical Wikipedia Articles
  • 120+ Optical Communication & Photonics Textbooks
  • 500+ ITUT and IEEE Papers
  • 1000+ Web Articles

The training corpus includes content related to:

  • Optical fibers
  • Photonic devices
  • Multiplexing (WDM, TDM, OTN )
  • Optical amplifiers
  • Modulation techniques
  • Communication networks
  • Laser systems ....etc

⚙️ Training Details

Parameter Value Description
batch_size 64 Number of samples per training batch
epochs 15 Number of training epochs
patience 6 Early stopping patience
learning_rate 5e-5 Learning rate for the AdamW optimizer
weight_decay 0.01 Weight decay for regularization
objective MLM Masked Language Modeling

The training was performed using the transformers library by Hugging Face.


Core Use Case: Domain-Specific Embeddings

The fine-tuned model is particularly effective at generating context-aware embeddings for the optical domain. This makes it highly suitable for tasks such as:

  • Semantic Search across technical documents
  • Retrieval-Augmented Generation (RAG) for Q&A systems
  • Topic Modeling and document clustering
  • Similarity Matching between questions, answers, or papers

How to Use

Load the model

from transformers import RobertaTokenizerFast, RobertaModel

tokenizer = RobertaTokenizerFast.from_pretrained("quantum-leap-vcti/VCTI-RoBERTa-Fiber")
model = RobertaModel.from_pretrained("quantum-leap-vcti/VCTI-RoBERTa-Fiber")

text = "Wavelength-division multiplexing increases the capacity of optical fibers."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(dim=1)

MIT License

Copyright (c) 2025 Velankani Communications Technologies Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files , to deal in the Software without restriction

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for quantum-leap-vcti/VCTI-RoBERTa-Fiber

Finetuned
(2197)
this model