Instructions to use fjavigv24/snoweu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fjavigv24/snoweu with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("fjavigv24/snoweu")

sentences = [
    "What are the conditions that must be met for the appointment of a depositary established in a third country for non-EU AIFs?",
    "(a)\n\nfor EU AIFs, in the home Member State of the AIF;\n\n(b)\n\nfor non-EU AIFs, in the third country where the AIF is established or in the home Member State of the AIFM managing the AIF or in the Member State of reference of the AIFM managing the AIF.\n\n6.\n\nWithout prejudice to the requirements set out in paragraph 3, the appointment of a depositary established in a third country shall, at all times, be subject to the following conditions:\n\n(a)",
    "(c)\n\nthe financial soundness of the proposed acquirer, in particular in relation to the type of business pursued and envisaged in the investment firm in which the acquisition is proposed;\n\n(d)\n\nwhether the investment firm will be able to comply and continue to comply with the prudential requirements based on this Directive and, where applicable, other Directives, in particular Directives 2002/87/EC and 2013/36/EU, in particular, whether the group of which it will become a part has a structure that makes it possible to exercise effective supervision, effectively exchange information among the competent authorities and determine the allocation of responsibilities among the competent authorities;\n\n(e)",
    "(f)\n\nthe undertaking shall describe the expected decarbonisation levers and their overall quantitative contributions to achieve the GHG emission reduction targets (e.g., energy or material efficiency and consumption reduction, fuel switching, use of renewable energy , phase out or substitution of product and process).\n\nDisclosure Requirement E1-5 – Energy consumption and mix\n\nThe undertaking shall provide information on its energy consumption and mix.\n\nThe objective of this Disclosure Requirement is to provide an understanding of the undertaking’s total energy consumption in absolute value, improvement in energy efficiency, exposure to coal, oil and gas-related activities, and the share of renewable energy in its overall energy mix."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Snowflake/snowflake-arctic-embed-m-v1.5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'What types of substances or mixtures should be listed in relation to their potential to react and create hazardous situations, and what additional information is required to manage the associated risks?',
    'Families of substances or mixtures or specific substances, such as water, air, acids, bases, oxidising agents, with which the substance or mixture could react to produce a hazardous situation (like an explosion, a release of toxic or flammable materials, or a liberation of excessive heat), shall be listed and if appropriate a brief description of measures to be taken to manage risks associated with such hazards shall be given.\n\n10.6. Hazardous decomposition products\n\nKnown and reasonably anticipated hazardous decomposition products produced as a result of use, storage, spill and heating shall be listed. Hazardous combustion products shall be included in section 5 of the safety data sheet.\n\n11. SECTION 11: Toxicological information',
    'The undertaking shall specify as part of the contextual information, whether the targets that it has set and presented are mandatory (required by legislation) or voluntary.\n\nDisclosure Requirement E2-4 – Pollution of air, water and soil\n\nThe undertaking shall disclose the pollutants that it emits through its own operations, as well as the microplastics it generates or uses.\n\nThe objective of this Disclosure Requirement is to provide an understanding of the emissions that the undertaking generates to air, water and soil in its own operations, and of its generation and use of microplastics.\n\nThe undertaking shall disclose the amounts of:\n\n(a)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.7468
cosine_accuracy@3	0.8989
cosine_accuracy@5	0.9306
cosine_accuracy@10	0.9605
cosine_precision@1	0.7468
cosine_precision@3	0.2996
cosine_precision@5	0.1861
cosine_precision@10	0.096
cosine_recall@1	0.7468
cosine_recall@3	0.8989
cosine_recall@5	0.9306
cosine_recall@10	0.9605
cosine_ndcg@10	0.8608
cosine_mrr@10	0.8281
cosine_map@100	0.8298

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.7531
cosine_accuracy@3	0.9079
cosine_accuracy@5	0.9402
cosine_accuracy@10	0.968
cosine_precision@1	0.7531
cosine_precision@3	0.3026
cosine_precision@5	0.188
cosine_precision@10	0.0968
cosine_recall@1	0.7531
cosine_recall@3	0.9079
cosine_recall@5	0.9402
cosine_recall@10	0.968
cosine_ndcg@10	0.8682
cosine_mrr@10	0.8354
cosine_map@100	0.8368

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.8397
cosine_accuracy@3	0.9556
cosine_accuracy@5	0.9746
cosine_accuracy@10	0.9897
cosine_precision@1	0.8397
cosine_precision@3	0.3185
cosine_precision@5	0.1949
cosine_precision@10	0.099
cosine_recall@1	0.8397
cosine_recall@3	0.9556
cosine_recall@5	0.9746
cosine_recall@10	0.9897
cosine_ndcg@10	0.9226
cosine_mrr@10	0.9002
cosine_map@100	0.9008

Training Details

Training Dataset

Unnamed Dataset

Size: 26,299 training samples
Columns: sentence_0 and sentence_1
Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1
type string string
details
min: 16 tokens
mean: 38.67 tokens
max: 215 tokens

min: 5 tokens
mean: 251.42 tokens
max: 512 tokens

	sentence_0	sentence_1
type	string	string
details	min: 16 tokens mean: 38.67 tokens max: 215 tokens	min: 5 tokens mean: 251.42 tokens max: 512 tokens

Samples:

sentence_0	sentence_1
`What are the key considerations the Commission must evaluate when assessing the feasibility of including municipal waste incineration installations in the EU ETS by 31 July 2026?`	By 31 July 2026, the Commission shall present a report to the European Parliament and to the Council in which it shall assess the feasibility of including municipal waste incineration installations in the EU ETS, including with a view to their inclusion from 2028 and with an assessment of the potential need for an option for a Member State to opt out until 31 December 2030. In that regard, the Commission shall take into account the importance of all sectors contributing to emission reductions and potential diversion of waste towards disposal by landfilling in the Union and waste exports to third countries. The Commission shall in addition take into account relevant criteria such as the effects on the internal market, potential distortions
`What are the conditions under which a registrant can withhold certain information from disclosure, and what steps must they take to justify this decision?`	NOTES Note 1: If it is not technically possible, or if it does not appear scientifically necessary to give information, the reasons shall be clearly stated, in accordance with the relevant provisions. Note 2: The registrant may wish to declare that certain information submitted in the registration dossier is commercially sensitive and its disclosure might harm him commercially. If this is the case, he shall list the items and provide a justification. ▼C1 INFORMATION REFERRED TO IN ARTICLE 10(a) (i) TO (v) 1. GENERAL REGISTRANT INFORMATION 1.1. Registrant ▼M70 1.1.1. Name, address, telephone number and email address ▼C1 1.1.2. Contact person 1.1.3. Location of the registrant's production and own use site(s), as appropriate ▼M70
`What are the specific color indices and chemical identifiers for Pigment Red 112 and Pigment Yellow 14, and what is their respective concentration percentage?`	17 (PR17)/CI 12390 229-681-4 6655-84-1 0,1 % Pigment Red 112 (PR112)/CI 12370 229-440-3 6535-46-2 0,1 % Pigment Yellow 14 (PY14)/CI 21095 226-789-3 5468-75-7 0,1 % Pigment Yellow 55 (PY55)/CI 21096 226-789-3 6358-37-8 0,1 % Pigment Red 2 (PR2)/CI 12310 227-930-1 6041-94-7 0,1 % Pigment Red 22 (PR22)/CI 12315 229-245-3 6448-95-9 0,1 % Pigment Red 146 (PR146)/CI 12485 226-103-2 5280-68-2 0,1 % Pigment Red 269 (PR269)/CI 12466 268-028-8 67990-05-0 0,1 % Pigment Orange16 (PO16)/CI 21160 229-388-1 6505-28-8 0,1 % Pigment Yellow 1 (PY1)/CI 11680 219-730-8 2512-29-0 0,1 % Pigment Yellow 12 (PY12)/CI 21090 228-787-8 6358-85-6 0,1 % Pigment Yellow 87 (PY87)/CI 21107:1 239-160-3 15110-84-6, 14110-84-6 0,1 % Pigment Yellow 97 (PY97)/CI 11767

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 6
per_device_eval_batch_size: 6
num_train_epochs: 4
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 6
per_device_eval_batch_size: 6
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand

Epoch	Step	Training Loss	cosine_ndcg@10
0.0228	100	-	0.6723
0.0456	200	-	0.7870
0.0684	300	-	0.8397
0.0912	400	-	0.8608
0.1141	500	0.4135	-
0.0228	100	-	0.8669
0.0456	200	-	0.8682
0.0228	100	-	0.8699
0.0456	200	-	0.8733
0.0684	300	-	0.8759
0.0912	400	-	0.8802
0.1141	500	0.1122	0.8823
0.1369	600	-	0.8847
0.1597	700	-	0.8835
0.1825	800	-	0.8862
0.2053	900	-	0.8864
0.2281	1000	0.1299	0.8860
0.2509	1100	-	0.8837
0.2737	1200	-	0.8861
0.2965	1300	-	0.8882
0.3193	1400	-	0.8850
0.3422	1500	0.123	0.8916
0.3650	1600	-	0.8866
0.3878	1700	-	0.8917
0.4106	1800	-	0.8918
0.4334	1900	-	0.8904
0.4562	2000	0.0769	0.8896
0.4790	2100	-	0.8876
0.5018	2200	-	0.8956
0.5246	2300	-	0.8964
0.5474	2400	-	0.8901
0.5703	2500	0.0697	0.8888
0.5931	2600	-	0.8872
0.6159	2700	-	0.8839
0.6387	2800	-	0.8891
0.6615	2900	-	0.8890
0.6843	3000	0.0537	0.8867
0.7071	3100	-	0.8907
0.7299	3200	-	0.8916
0.7527	3300	-	0.8933
0.7755	3400	-	0.8933
0.7984	3500	0.0772	0.8924
0.8212	3600	-	0.8946
0.8440	3700	-	0.8953
0.8668	3800	-	0.8941
0.8896	3900	-	0.8939
0.9124	4000	0.065	0.8953
0.9352	4100	-	0.8969
0.9580	4200	-	0.8993
0.9808	4300	-	0.9020
1.0	4384	-	0.9040
1.0036	4400	-	0.9044
1.0265	4500	0.0329	0.9015
1.0493	4600	-	0.8999
1.0721	4700	-	0.9005
1.0949	4800	-	0.8976
1.1177	4900	-	0.9001
1.1405	5000	0.024	0.9014
1.1633	5100	-	0.8995
1.1861	5200	-	0.9022
1.2089	5300	-	0.9030
1.2318	5400	-	0.9027
1.2546	5500	0.016	0.9024
1.2774	5600	-	0.9012
1.3002	5700	-	0.9011
1.3230	5800	-	0.9049
1.3458	5900	-	0.9094
1.3686	6000	0.0553	0.9094
1.3914	6100	-	0.9028
1.4142	6200	-	0.9113
1.4370	6300	-	0.9118
1.4599	6400	-	0.9139
1.4827	6500	0.0416	0.9112
1.5055	6600	-	0.9102
1.5283	6700	-	0.9092
1.5511	6800	-	0.9098
1.5739	6900	-	0.9101
1.5967	7000	0.0283	0.9107
1.6195	7100	-	0.9114
1.6423	7200	-	0.9131
1.6651	7300	-	0.9130
1.6880	7400	-	0.9144
1.7108	7500	0.0268	0.9126
1.7336	7600	-	0.9119
1.7564	7700	-	0.9125
1.7792	7800	-	0.9111
1.8020	7900	-	0.9100
1.8248	8000	0.0252	0.9110
1.8476	8100	-	0.9151
1.8704	8200	-	0.9123
1.8932	8300	-	0.9118
1.9161	8400	-	0.9103
1.9389	8500	0.0288	0.9110
1.9617	8600	-	0.9106
1.9845	8700	-	0.9109
2.0	8768	-	0.9126
2.0073	8800	-	0.9117
2.0301	8900	-	0.9114
2.0529	9000	0.0232	0.9123
2.0757	9100	-	0.9113
2.0985	9200	-	0.9095
2.1214	9300	-	0.9086
2.1442	9400	-	0.9109
2.1670	9500	0.0188	0.9124
2.1898	9600	-	0.9125
2.2126	9700	-	0.9121
2.2354	9800	-	0.9122
2.2582	9900	-	0.9132
2.2810	10000	0.0182	0.9125
2.3038	10100	-	0.9142
2.3266	10200	-	0.9135
2.3495	10300	-	0.9084
2.3723	10400	-	0.9147
2.3951	10500	0.0111	0.9170
2.4179	10600	-	0.9142
2.4407	10700	-	0.9158
2.4635	10800	-	0.9174
2.4863	10900	-	0.9176
2.5091	11000	0.0153	0.9166
2.5319	11100	-	0.9172
2.5547	11200	-	0.9171
2.5776	11300	-	0.9168
2.6004	11400	-	0.9176
2.6232	11500	0.0241	0.9170
2.6460	11600	-	0.9177
2.6688	11700	-	0.9184
2.6916	11800	-	0.9196
2.7144	11900	-	0.9211
2.7372	12000	0.0172	0.9209
2.7600	12100	-	0.9212
2.7828	12200	-	0.9201
2.8057	12300	-	0.9194
2.8285	12400	-	0.9205
2.8513	12500	0.013	0.9202
2.8741	12600	-	0.9213
2.8969	12700	-	0.9210
2.9197	12800	-	0.9203
2.9425	12900	-	0.9200
2.9653	13000	0.03	0.9209
2.9881	13100	-	0.9212
3.0	13152	-	0.9200
3.0109	13200	-	0.9198
3.0338	13300	-	0.9192
3.0566	13400	-	0.9183
3.0794	13500	0.0133	0.9170
3.1022	13600	-	0.9181
3.125	13700	-	0.9180
3.1478	13800	-	0.9176
3.1706	13900	-	0.9168
3.1934	14000	0.0185	0.9175
3.2162	14100	-	0.9188
3.2391	14200	-	0.9182
3.2619	14300	-	0.9192
3.2847	14400	-	0.9199
3.3075	14500	0.0135	0.9195
3.3303	14600	-	0.9190
3.3531	14700	-	0.9187
3.3759	14800	-	0.9196
3.3987	14900	-	0.9202
3.4215	15000	0.0157	0.9214
3.4443	15100	-	0.9211
3.4672	15200	-	0.9211
3.4900	15300	-	0.9208
3.5128	15400	-	0.9195
3.5356	15500	0.015	0.9207
3.5584	15600	-	0.9210
3.5812	15700	-	0.9226

Framework Versions

Python: 3.10.11
Sentence Transformers: 3.4.1
Transformers: 4.48.1
PyTorch: 2.4.0+cu121
Accelerate: 1.4.0
Datasets: 3.3.2
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for fjavigv24/snoweu

Base model

Snowflake/snowflake-arctic-embed-m-v1.5

Finetuned

(11)

this model

Papers for fjavigv24/snoweu

Evaluation results

Cosine Accuracy@1 on Unknown
self-reported

0.747
Cosine Accuracy@3 on Unknown
self-reported

0.899
Cosine Accuracy@5 on Unknown
self-reported

0.931
Cosine Accuracy@10 on Unknown
self-reported

0.960
Cosine Precision@1 on Unknown
self-reported

0.747
Cosine Precision@3 on Unknown
self-reported

0.300
Cosine Precision@5 on Unknown
self-reported

0.186
Cosine Precision@10 on Unknown
self-reported

0.096