SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'on the udyam registration form my daughter s pan number and name were incorrectly updated instead of my pan number and name. the original pan number to be updated is <pan_no> and the name is durai singh. issue update of contact details in udyam registration certificate context the user is requesting an update of the mobile number and email id in the udyam registration certificate for efes process equipment pvt ltd. details - udyam registration no udyam-ap- <NUM> - <NUM> current mobile no <NUM> current email id pramod4holy@gmail.com aadhaar no <NUM> old mobile no <NUM> old email id irfansirfan60@gmail.com',
    'UAM/Udyam Registration/Certificate related issues. After Cancellation, Unable to Register with PAN Details (Technical). this category refers to grievances where an entrepreneur is unable to create a new udyam registration using their pan after an earlier registration has already been cancelled. in such situations the system may continue to recognize the pan as already associated with an existing registration preventing the user from completing a new registration. grievances under this category generally occur when an enterprise previously cancelled its registration due to closure incorrect details or duplication and later attempts to register again using the same pan. users may report that the system still displays a message indicating that a registration already exists for that pan even though the earlier registration was cancelled. some entrepreneurs also encounter errors where the portal does not allow them to proceed with registration because the pan remains linked to the previous record. these grievances are commonly raised by business owners proprietors partners or company directors attempting to register their enterprise again after cancelling an earlier registration. the issue may also be reported by authorized representatives compliance managers or consultants responsible for completing the msme registration process on behalf of the enterprise. such grievances typically arise when the system does not update the cancellation status correctly or when residual records associated with the pan prevent the new registration from being completed.',
    'Technology, Quality and Institutions. Manufactruing (Chemical/Food/Electrical & Electronics). manufacturing in the chemical food electrical and electrical electronics sectors under msme refers to sector-focused support provided by the ministry of msme through a combination of specialized infrastructure technology upgradation and competitiveness schemes. this includes dedicated technology centres for activities such as fragrance and flavour development in the chemical sector tooling and process development for electrical measuring instruments and electronics and esdm-focused prototyping and testing facilities under programmes like the technology centre systems programme and clcss. food processing msmes are supported through cluster-based common facility centres offering shared infrastructure for testing r d packaging cold chains and effluent treatment under the mse cluster development framework. these sectoral interventions are complemented by horizontal schemes such as lean manufacturing zed certification and digital msme which help units improve quality sustainability productivity and market readiness. together these measures aim to enable value-added manufacturing reduce individual investment burdens promote compliance with quality and environmental standards and enhance domestic as well as export competitiveness across these msme-intensive sectors. examples of grievances include technology centre access denial an electronics msme seeking advanced esdm testing is denied access at a specialized technology centre because available slots are prioritized for chemical or fragrance units delaying product validation. clcss machinery rejection a food processing unit s modern packaging or processing machine is not included in the approved sub-sector or machinery list resulting in rejection of the <NUM> capital subsidy claim. common facility centre shortfall a chemical manufacturing cluster s approved cfc does not include the promised effluent treatment facility forcing individual msmes to incur high compliance and disposal costs. zed certification scoring dispute a food msme implementing lean practices and waste reduction measures receives lower-than-expected scores during audit missing bronze certification despite documented improvements. lean cluster exclusion a small electrical and electronics group with fewer than the required number of units is excluded from lean manufacturing cluster support even though the cluster has clear process improvement potential.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7657, 0.4355],
#         [0.7657, 1.0000, 0.5440],
#         [0.4355, 0.5440, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine nan
spearman_cosine nan

Training Details

Training Dataset

Unnamed Dataset

  • Size: 124 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 124 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 51 tokens
    • mean: 143.99 tokens
    • max: 256 tokens
    • min: 181 tokens
    • mean: 252.81 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    not having register mobile no and email id need to update the mobile number and email id issue invalid incomplete grievance context the grievance text does not contain sufficient or meaningful information to identify an issue related to the msme scheme. details - Technology, Quality and Institutions. Related to Tool Rooms. this category encompasses grievances related to the operational and technical services provided by government-supported msme tool rooms. the scope includes issues with access to machinery prototyping facilities manufacturing support and skill-development or training programs. key areas of concern include unavailability of machine time despite confirmed bookings equipment under maintenance or frequent breakdowns high-demand machines consistently overbooked infrastructure promised for msme production support not accessible when required delays cancellations or poor execution of technical training programs non-availability of trainers or technical experts mismatch between published and actual service fees lack of transparency during machine usage or training delivery these grievances directly impact production timelines project execution and workforce upskilling. they arise from service delivery and operational failures rather t...
    i never applied for udyam registration before but it is showing that it has already been done through my pan. kindly look into this. issue retrieval of udyam registration number and contact details context the user is requesting the udyam registration number and contact details associated with the existing udyam registration in order to obtain the udyam certificate or update the details. details - pan no agtpj3178r aadhar no UAM/Udyam Registration/Certificate related issues. Migration from UAM to UDYAM. this category refers to grievances related to the migration of enterprises registered under the earlier udyog aadhaar memorandum uam system to the current udyam registration system. the uam registration system was used earlier for msme registration but enterprises registered under that system were required to migrate their registration details to the newer udyam portal to maintain updated records. during this migration process some enterprises encounter difficulties in transferring or verifying their existing registration details. grievances under this category typically include issues where business owners are unable to complete the migration process from uam to udyam due to errors or system restrictions. entrepreneurs may report that their uam number is not being recognized by the portal or that the migration process stops due to validation errors related to aadhaar pan or enterprise details. some users a...
    my team needs incubator support for mentoring workspace and early funding to grow our innovative product but this delay is forcing us to shut down. please check my application and release the support fast to save my business.got no response or funding approval past months issue delayed incubator support under nmcp scheme context the user is reporting that the application for incubator support under the nmcp scheme has not been processed or approved within the expected timeframe and is requesting urgent assistance to prevent business shutdown. details - incubator support required mentoring workspace early funding application status no response or funding approval past months Technology, Quality and Institutions. Support for entrepreneurial and managerial development of SMEs through incubators- an NMCP Scheme. the support for entrepreneurial and managerial development of smes through incubators scheme under the national manufacturing competitiveness programme nmcp is an initiative of the ministry of msme designed to nurture innovative technology-driven and knowledge-based ideas by providing structured incubation support through approved business incubators hosted in technical academic or research institutions. under the scheme financial assistance of up to lakh is provided per idea or incubated unit for product development testing validation and commercialisation with an overall ceiling of . lakh per incubator to support up to ventures. in addition host institutions may receive up to . lakh for minor infrastructure and facility upgrades to strengthen incubation capabilities. the scheme follows a tripartite arrangement amo...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 32,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 6
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step spearman_cosine
1.0 2 nan
2.0 4 nan
3.0 6 nan
4.0 8 nan
5.0 10 nan
6.0 12 nan

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.3.0
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
147
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ambika14/bge_grievance_classifier-code-B2

Finetuned
(460)
this model

Papers for Ambika14/bge_grievance_classifier-code-B2

Evaluation results