Sentence Similarity
sentence-transformers
Safetensors
bert
feature-extraction
Generated from Trainer
dataset_size:46338
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use fjavigv/snoweu_v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use fjavigv/snoweu_v4 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("fjavigv/snoweu_v4") sentences = [ "What are the chemical names and corresponding identifiers for octabromo derivate and 2-Methoxyethanol, including their CAS numbers and EC numbers?", "octabromo derivate 602-094-00-4 251-087-9 32536-52-0 2-Methoxyethanol; ethylene glycol monomethyl ether; methylglycol 603-011-00-4 203-713-7 109-86-4 2-Ethoxyethanol; ethylene glycol monoethyl ether; ethylglycol 603-012-00-X 203-804-1 110-80-5 [▼M61](./../../../legal-content/EN/AUTO/?uri=celex:32020R2096 \"32020R2096: INSERTED\") Ethylene oxide; oxirane 603-023-00-X 200-849-9 75-21-8 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29 \"32006R1907R(01): REPLACED\") 1,2-Dimethoxyethane ethylene glycol dimethyl ether EGDME 603-031-00-3 203-794-9 110-71-4 [▼M45](./../../../legal-content/EN/AUTO/?uri=celex:32017R1510 \"32017R1510: INSERTED\") Tetrahydro-2-furyl-methanol; tetrahydrofurfuryl alcohol 603-061-00-7 202-625-6 97-99-4", "hydrocarbons produced as the residual fraction from the distillation of heavy coker gas oil and vacuum gas oil. It predominantly consists of hydrocarbons having carbon numbers predominantly greater than C13 and boiling above approximately 230 °C.) 649-026-00-X 270-796-4 68478-17-1 Residues (petroleum), heavy coker and light vacuum; Heavy fuel oil (A complex combination of hydrocarbons produced as the residual fraction from the distillation of heavy coker gas oil and light vacuum gas oil. It consists predominantly of hydrocarbons having carbon numbers predominantly greater than C13 and boiling above approximately 230 °C.) 649-027-00-5 270-983-0 68512-61-8 Residues (petroleum), light vacuum; Heavy fuel oil (A complex residuum from the vacuum distillation of the residuum from the atmospheric distillation of crude oil. It consists of hydrocarbons having carbon numbers predominantly greater than C13 and boiling above approximately 230 °C.) 649-028-00-0 270-984-6 68512-62-9 Residues (petroleum), steam-cracked light; Heavy fuel oil (A complex residuum from the distillation of the products from a steam-cracking process. It consists predominantly of aromatic and unsaturated hydrocarbons having carbon numbers greater than C7 and boiling in the range of approximately 101 to 555 °C.) 649-029-00-6 271-013-9 68513-69-9 Fuel oil, No 6; Heavy fuel oil (A distillate oil having a minimum viscosity of 197 10-6 m2s-1 at 37,7 °C to a maximum of 197 10-5 m2s-1 at 37,7 °C.) 649-030-00-1 271-384-7 68553-00-4 Residues (petroleum), topping plant, low-sulfur; Heavy fuel oil (A low-sulfur complex combination of hydrocarbons produced as the residual fraction from the topping plant distillation of crude oil. It is the residuum after the straight-run gasoline cut, kerosene cut and gas oil cut have been removed.) 649-031-00-7 271-763-7 68607-30-7 Gas oils (petroleum), heavy atmospheric; Heavy fuel oil (A complex combination of hydrocarbons obtained by the distillation of crude oil. It consists of hydrocarbons having carbon numbers predominantly in the range of C7 through C35 and boiling in the range of approximately 121 to 510 °C.) 649-032-00-2 272-184-2 68783-08-4 Residues (petroleum), coker scrubber, Condensed-ring-arom.-contg.; Heavy fuel", "(e)\n\nwhere applicable, how the undertaking assesses the effectiveness of its engagement with its own workforce, including, where relevant, any agreements or outcomes that result.\n\nWhere applicable, the undertaking shall disclose the steps it takes to gain insight into the perspectives of people in its own workforce who may be particularly vulnerable to impacts and/or marginalised (for example, women, migrants, people with disabilities).\n\nIf the undertaking cannot disclose the above required information because it has not adopted a general process to engage with its own workforce , it shall disclose this to be the case. It may disclose a timeframe in which it aims to have such a process in place." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:46338
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-m-v1.5
widget:
- source_sentence: >-
What are the chemical names and corresponding identifiers for octabromo
derivate and 2-Methoxyethanol, including their CAS numbers and EC numbers?
sentences:
- >-
octabromo derivate 602-094-00-4 251-087-9 32536-52-0 2-Methoxyethanol;
ethylene glycol monomethyl ether; methylglycol 603-011-00-4 203-713-7
109-86-4 2-Ethoxyethanol; ethylene glycol monoethyl ether; ethylglycol
603-012-00-X 203-804-1 110-80-5
[▼M61](./../../../legal-content/EN/AUTO/?uri=celex:32020R2096
"32020R2096: INSERTED") Ethylene oxide; oxirane 603-023-00-X 200-849-9
75-21-8
[▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
"32006R1907R(01): REPLACED") 1,2-Dimethoxyethane ethylene glycol
dimethyl ether EGDME 603-031-00-3 203-794-9 110-71-4
[▼M45](./../../../legal-content/EN/AUTO/?uri=celex:32017R1510
"32017R1510: INSERTED") Tetrahydro-2-furyl-methanol; tetrahydrofurfuryl
alcohol 603-061-00-7 202-625-6 97-99-4
- >-
hydrocarbons produced as the residual fraction from the distillation of
heavy coker gas oil and vacuum gas oil. It predominantly consists of
hydrocarbons having carbon numbers predominantly greater than C13 and
boiling above approximately 230 °C.) 649-026-00-X 270-796-4 68478-17-1
Residues (petroleum), heavy coker and light vacuum; Heavy fuel oil (A
complex combination of hydrocarbons produced as the residual fraction
from the distillation of heavy coker gas oil and light vacuum gas oil.
It consists predominantly of hydrocarbons having carbon numbers
predominantly greater than C13 and boiling above approximately 230 °C.)
649-027-00-5 270-983-0 68512-61-8 Residues (petroleum), light vacuum;
Heavy fuel oil (A complex residuum from the vacuum distillation of the
residuum from the atmospheric distillation of crude oil. It consists of
hydrocarbons having carbon numbers predominantly greater than C13 and
boiling above approximately 230 °C.) 649-028-00-0 270-984-6 68512-62-9
Residues (petroleum), steam-cracked light; Heavy fuel oil (A complex
residuum from the distillation of the products from a steam-cracking
process. It consists predominantly of aromatic and unsaturated
hydrocarbons having carbon numbers greater than C7 and boiling in the
range of approximately 101 to 555 °C.) 649-029-00-6 271-013-9 68513-69-9
Fuel oil, No 6; Heavy fuel oil (A distillate oil having a minimum
viscosity of 197 10-6 m2s-1 at 37,7 °C to a maximum of 197 10-5 m2s-1 at
37,7 °C.) 649-030-00-1 271-384-7 68553-00-4 Residues (petroleum),
topping plant, low-sulfur; Heavy fuel oil (A low-sulfur complex
combination of hydrocarbons produced as the residual fraction from the
topping plant distillation of crude oil. It is the residuum after the
straight-run gasoline cut, kerosene cut and gas oil cut have been
removed.) 649-031-00-7 271-763-7 68607-30-7 Gas oils (petroleum), heavy
atmospheric; Heavy fuel oil (A complex combination of hydrocarbons
obtained by the distillation of crude oil. It consists of hydrocarbons
having carbon numbers predominantly in the range of C7 through C35 and
boiling in the range of approximately 121 to 510 °C.) 649-032-00-2
272-184-2 68783-08-4 Residues (petroleum), coker scrubber,
Condensed-ring-arom.-contg.; Heavy fuel
- >-
(e)
where applicable, how the undertaking assesses the effectiveness of its
engagement with its own workforce, including, where relevant, any
agreements or outcomes that result.
Where applicable, the undertaking shall disclose the steps it takes to
gain insight into the perspectives of people in its own workforce who
may be particularly vulnerable to impacts and/or marginalised (for
example, women, migrants, people with disabilities).
If the undertaking cannot disclose the above required information
because it has not adopted a general process to engage with its own
workforce , it shall disclose this to be the case. It may disclose a
timeframe in which it aims to have such a process in place.
- source_sentence: >-
Under what circumstances can the suspension or removal of a financial
instrument or derivative from trading be exempted, despite infringing
Articles 7 and 17 of Regulation (EU) No 596/2014?
sentences:
- >-
(15) Directive 2010/75/EU of the European Parliament and of the Council
of 24 November 2010 on industrial emissions (integrated pollution
prevention and control) (recast) (OJ L 334, 17.12.2010, p. 17).
(16) Directive 2011/92/EU of the European Parliament and of the Council
of 13 December 2011 on the assessment of the effects of certain public
and private projects on the environment (OJ L 26, 28.1.2012, p. 1).
(17) Directive 2012/18/EU of the European Parliament and of the Council
of 4 July 2012 on the control of major-accident hazards involving
dangerous substances, amending and subsequently repealing Council
Directive 96/82/EC (OJ L 197, 24.7.2012, p. 1).
- >-
3.
Where the competent authority of the host Member State of a regulated
market, an MTF or OTF has clear and demonstrable grounds for believing
that such regulated market, MTF or OTF infringes the obligations arising
from the provisions adopted pursuant to this Directive, it shall refer
those findings to the competent authority of the home Member State of
the regulated market or the MTF or OTF.
- >-
The notified competent authorities of the other Member States shall
require that regulated markets, other MTFs, other OTFs and systematic
internalisers, which fall under their jurisdiction and trade the same
financial instrument or derivatives referred to in points (4) to (10) of
Section C of Annex I that relate or are referenced to that financial
instrument, also suspend or remove that financial instrument or
derivatives from trading, where the suspension or removal is due to
suspected market abuse, a take-over bid or the non- disclosure of inside
information about the issuer or financial instrument infringing Articles
7 and 17 of Regulation (EU) No 596/2014 except where such suspension or
removal could cause significant damage to the
- source_sentence: >-
How can the limitation period for the Commission's powers be interrupted
according to Article 38?
sentences:
- >-
2.
That third-country dialogue shall not prevent the Commission from taking
action under this Regulation. Individual measures adopted pursuant to
this Regulation shall not be addressed within that dialogue.
Article 38
Limitation periods
1.
The powers of the Commission under Articles 10 and 11 shall be subject
to a limitation period of 10 years, starting on the day on which a
foreign subsidy is granted to an undertaking. Any action taken by the
Commission under Article 10, 13, 14 or 15 with respect to a foreign
subsidy shall interrupt the limitation period. After each interruption,
the limitation period of 10 years shall start to run afresh.
2.
- >-
(36) Member States should promote energy efficient means of mobility,
including in their public procurement practices, such as rail, cycling,
walking or shared mobility, by renewing and decarbonising fleets,
encouraging a modal shift and including those modes in urban mobility
planning.
- >-
air oxidation of petrolatum.) 649-255-00-5 265-206-7 64743-01-7 N
Petrolatum (petroleum), alumina-treated; Petrolatum (A complex
combination of hydrocarbons obtained when petrolatum is treated with
Al2O3 to remove polar components and impurities. It consists
predominantly of saturated, crystalline, and liquid hydrocarbons having
carbon numbers predominantly greater than C25.) 649-256-00-0 285-098-5
85029-74-9 N Petrolatum (petroleum), hydrotreated; Petrolatum (A complex
combination of hydrocarbons obtained as a semi-solid from dewaxed
paraffinic residual oil treated with hydrogen in the presence of a
catalyst. It consists predominantly of saturated, microcrystalline, and
liquid hydrocarbons having carbon numbers predominantly greater than
- source_sentence: >-
What specific sections and points of Annex VIII are included in the
registration for high-risk AI systems in the areas of law enforcement,
migration, asylum, and border control management?
sentences:
- >-
▼M15
Article 18b
Assistance from the Commission, EMSA and other relevant organisations
1.
For the purposes of carrying out its obligations under Article 3c(4) and
Articles 3g, 3gd, 3ge, 3gf, 3gg and 18a, the Commission, the
administering Member State and administering authorities in respect of a
shipping company may request the assistance of EMSA or another relevant
organisation and may conclude to that effect any appropriate agreements
with those organisations.
2.
The Commission, assisted by EMSA, shall endeavour to develop appropriate
tools and guidance to facilitate and coordinate verification and
enforcement activities related to the application of this Directive to
maritime transport. As far as practicable, such guidance and tools shall
be made available to the Member States and the verifiers for
information-sharing purposes and in order to better ensure robust
enforcement of the national measures transposing this Directive.
▼B
Article 19
Registries
▼M4
1.
Allowances issued from 1 January 2012 onwards shall be held in the ►M9
Union ◄ registry for the execution of processes pertaining to the
maintenance of the holding accounts opened in the Member State and the
allocation, surrender and cancellation of allowances under the
Commission ►M9 Acts ◄ referred to in paragraph 3.
Each Member State shall be able to fulfil the execution of authorised
operations under the UNFCCC or the Kyoto Protocol.
▼B
2.
Any person may hold allowances. The registry shall be accessible to the
public and shall contain separate accounts to record the allowances held
by each person to whom and from whom allowances are issued or
transferred.
▼M9
3.
- >-
(35)
‘recycled carbon fuels’ means liquid and gaseous fuels that are produced
from liquid or solid waste streams of non-renewable origin which are not
suitable for material recovery in accordance with Article 4 of Directive
2008/98/EC, or from waste processing gas and exhaust gas of
non-renewable origin which are produced as an unavoidable and
unintentional consequence of the production process in industrial
installations;
▼M2
(36)
‘renewable fuels of non-biological origin’ means liquid and gaseous
fuels the energy content of which is derived from renewable sources
other than biomass;
▼B
(37)
- >-
4. For high-risk AI systems referred to in points 1, 6 and 7 of Annex
III, in the areas of law enforcement, migration, asylum and border
control management, the registration referred to in paragraphs 1, 2 and
3 of this Article shall be in a secure non-public section of the EU
database referred to in Article 71 and shall include only the following
information, as applicable, referred to in:
(a) Section A, points 1 to 10, of Annex VIII, with the exception of
points 6, 8 and 9; (b) Section B, points 1 to 5, and points 8 and 9 of
Annex VIII; --- --- (c) Section C, points 1 to 3, of Annex VIII; --- ---
(d) points 1, 2, 3 and 5, of Annex IX. --- ---
- source_sentence: >-
The document outlines various chemical substances classified as
carcinogenic or toxic for reproduction, detailing their respective
categories and regulatory dates. Specific compounds such as diarsenic
trioxide, lead chromate, and chromium trioxide are highlighted, indicating
their potential health risks and the timeline for their regulation.
sentences:
- >-
57(f) – human health) (a) 21 August 2013 (*) (b) By way of derogation
from point (a): 14 June 2023 for uses in mixtures containing DIBP at or
above 0,1 % and below 0,3 % weight by weight. (a) 21 February 2015 (**)
(b) By way of derogation from point (a): 14 December 2024 for uses in
mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by
weight. - [▼M15](./../../../legal-content/EN/AUTO/?uri=celex:32012R0125
"32012R0125: INSERTED") 8. Diarsenic trioxide EC No: 215-481-4 CAS No:
1327-53-3 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 9.
Diarsenic pentaoxide EC No: 215-116-9 CAS No: 1303-28-2 Carcinogenic
(category 1A) 21 November 2013 21 May 2015 — 10. Lead chromate EC No:
231-846-0 CAS No: 7758-97-6 Carcinogenic (category 1B) Toxic for
reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43
(*2) ◄ — 11. Lead sulfochromate yellow (C.I. Pigment Yellow 34) EC No:
215-693-7 CAS No: 1344-37-2 Carcinogenic (category 1B) Toxic for
reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43
(*2) ◄ — 12. Lead chromate molybdate sulphate red (C.I. Pigment Red 104)
EC No: 235-759-9 CAS No: 12656-85-8 Carcinogenic (category 1B) Toxic for
reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43
(*2) ◄ 13. Tris (2-chloroethyl) phosphate (TCEP) EC No: 204-118-5 CAS
No: 115-96-8 Toxic for reproduction (category 1B) 21 February 2014 21
August 2015 14. 2,4-Dinitrotoluene (2,4-DNT) EC No: 204-450-0 CAS No:
121-14-2 Carcinogenic (category 1B) 21 February 2014 ►M43 (*1) ◄ 21
August 2015 ►M43 (*2) ◄
[▼M22](./../../../legal-content/EN/AUTO/?uri=celex:32013R0348
"32013R0348: INSERTED") 15. Trichloroethylene EC No: 201-167-4 CAS No:
79-01-6 Carcinogenic (category 1B) 21 October 2014 ►M43 (*1) ◄ 21 April
2016 ►M43 (*2) ◄ — 16. Chromium trioxide EC No: 215-607-8 CAS No:
1333-82-0 Carcinogenic (category 1A) Mutagenic (category 1B) 21 March
2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 17. Acids generated
from chromium trioxide and their oligomers Group containing: Chromic
acid EC No: 231-801-5 CAS No: 7738-94-5 Dichromic acid EC No: 236-881-5
CAS No: 13530-68-2 Oligomers of chromic acid and dichromic acid EC No:
not yet assigned CAS No: not yet assigned Carcinogenic (category 1B) 21
March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 18. Sodium
dichromate EC No: 234-190-3 CAS No: 7789-12-0 10588-01-9 Carcinogenic
(category 1B) Mutagenic (category 1B) Toxic for reproduction (category
1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 19.
Potassium dichromate EC No: 231-906-6 CAS No: 7778-50-9 Carcinogenic
(category 1B) Mutagenic (category 1B) Toxic for reproduction (category
1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 20.
Ammonium dichromate EC No: 232-143-1 CAS No: 7789-09-5 Carcinogenic
(category 1B) Mutagenic (category 1B) Toxic for reproduction (category
1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 21.
Potassium chromate EC No: 232-140-5 CAS No: 7789-00-6 Carcinogenic
(category 1B) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21
September 2017 ►M43 (*2) ◄ 22. Sodium chromate EC No: 231-889-5 CAS No:
7775-11-3 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for
reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017
►M43 (*2) ◄
[▼M28](./../../../legal-content/EN/AUTO/?uri=celex:32014R0895
"32014R0895: INSERTED") 23. Formaldehyde, oligomeric reaction products
with aniline (technical MDA) EC No: 500-036-1 CAS No: 25214-70-4
Carcinogenic (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017
►M43 (*2) ◄ — 24. Arsenic acid EC No: 231-901-9 CAS No: 7778-39-4
Carcinogenic (category 1A) 22 February 2016 22 August 2017 — 25.
Bis(2-methoxyethyl) ether (diglyme) EC No: 203-924-4 CAS No: 111-96-6
Toxic for reproduction (category 1B) 22 February 2016 ►M43 (*1) ◄ 22
August 2017 ►M43 (*2) ◄ — 26. 1,2-dichloroethane (EDC) EC No: 203-458-1
CAS No: 107-06-2 Carcinogenic (category 1B) 22 May 2016 22 November 2017
— 27. 2,2′-dichloro-4,4′-methylenedianiline (MOCA) EC No: 202-918-9 CAS
No: 101-14-4 Carcinogenic (category 1B) 22 May 2016 ►M43 (*1) ◄ 22
November 2017 ►M43 (*2) ◄ — 28. Dichromium tris(chromate) EC No:
246-356-2 CAS No: 24613-89-6 Carcinogenic (category 1B) 22 July 2017
►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 29. Strontium chromate EC No:
232-142-6 CAS No: 7789-06-2 Carcinogenic (category 1B) 22 July 2017 ►M43
(*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 30. Potassium
hydroxyoctaoxodizincatedichromate EC
- >-
(c)
the financial soundness of the proposed acquirer, in particular in
relation to the type of business pursued and envisaged in the investment
firm in which the acquisition is proposed;
(d)
whether the investment firm will be able to comply and continue to
comply with the prudential requirements based on this Directive and,
where applicable, other Directives, in particular Directives 2002/87/EC
and 2013/36/EU, in particular, whether the group of which it will become
a part has a structure that makes it possible to exercise effective
supervision, effectively exchange information among the competent
authorities and determine the allocation of responsibilities among the
competent authorities;
(e)
- >-
No administrative costs or fees related to the implementation of
financing and investment operations under the EU guarantee shall be due
to the implementing partner by the Commission unless the nature of the
policy objectives targeted by the financial product to be implemented
and the affordability for the targeted final recipients or the type of
financing provided allow the implementing partner to duly justify to the
Commission the need for an exception. The coverage of such costs by the
Union budget shall be limited to the amount strictly required to
implement the relevant financing and investment operations, and shall be
provided only to the extent to which the costs are not covered by
revenues received by the implementing partners from
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.6777144829967202
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8972898325565337
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9390643880545486
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9691006387018816
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6777144829967202
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2990966108521779
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18781287761090967
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09691006387018813
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6777144829967202
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8972898325565337
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9390643880545486
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9691006387018816
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8364282304724784
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7924261355385132
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7938274567816883
name: Cosine Map@100
SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-m-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'The document outlines various chemical substances classified as carcinogenic or toxic for reproduction, detailing their respective categories and regulatory dates. Specific compounds such as diarsenic trioxide, lead chromate, and chromium trioxide are highlighted, indicating their potential health risks and the timeline for their regulation.',
'57(f) – human health) (a) 21 August 2013 (*) (b) By way of derogation from point (a): 14 June 2023 for uses in mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by weight. (a) 21 February 2015 (**) (b) By way of derogation from point (a): 14 December 2024 for uses in mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by weight. - [▼M15](./../../../legal-content/EN/AUTO/?uri=celex:32012R0125 "32012R0125: INSERTED") 8. Diarsenic trioxide EC No: 215-481-4 CAS No: 1327-53-3 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 9. Diarsenic pentaoxide EC No: 215-116-9 CAS No: 1303-28-2 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 10. Lead chromate EC No: 231-846-0 CAS No: 7758-97-6 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 11. Lead sulfochromate yellow (C.I. Pigment Yellow 34) EC No: 215-693-7 CAS No: 1344-37-2 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 12. Lead chromate molybdate sulphate red (C.I. Pigment Red 104) EC No: 235-759-9 CAS No: 12656-85-8 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ 13. Tris (2-chloroethyl) phosphate (TCEP) EC No: 204-118-5 CAS No: 115-96-8 Toxic for reproduction (category 1B) 21 February 2014 21 August 2015 14. 2,4-Dinitrotoluene (2,4-DNT) EC No: 204-450-0 CAS No: 121-14-2 Carcinogenic (category 1B) 21 February 2014 ►M43 (*1) ◄ 21 August 2015 ►M43 (*2) ◄ [▼M22](./../../../legal-content/EN/AUTO/?uri=celex:32013R0348 "32013R0348: INSERTED") 15. Trichloroethylene EC No: 201-167-4 CAS No: 79-01-6 Carcinogenic (category 1B) 21 October 2014 ►M43 (*1) ◄ 21 April 2016 ►M43 (*2) ◄ — 16. Chromium trioxide EC No: 215-607-8 CAS No: 1333-82-0 Carcinogenic (category 1A) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 17. Acids generated from chromium trioxide and their oligomers Group containing: Chromic acid EC No: 231-801-5 CAS No: 7738-94-5 Dichromic acid EC No: 236-881-5 CAS No: 13530-68-2 Oligomers of chromic acid and dichromic acid EC No: not yet assigned CAS No: not yet assigned Carcinogenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 18. Sodium dichromate EC No: 234-190-3 CAS No: 7789-12-0 10588-01-9 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 19. Potassium dichromate EC No: 231-906-6 CAS No: 7778-50-9 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 20. Ammonium dichromate EC No: 232-143-1 CAS No: 7789-09-5 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 21. Potassium chromate EC No: 232-140-5 CAS No: 7789-00-6 Carcinogenic (category 1B) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 22. Sodium chromate EC No: 231-889-5 CAS No: 7775-11-3 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ [▼M28](./../../../legal-content/EN/AUTO/?uri=celex:32014R0895 "32014R0895: INSERTED") 23. Formaldehyde, oligomeric reaction products with aniline (technical MDA) EC No: 500-036-1 CAS No: 25214-70-4 Carcinogenic (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 24. Arsenic acid EC No: 231-901-9 CAS No: 7778-39-4 Carcinogenic (category 1A) 22 February 2016 22 August 2017 — 25. Bis(2-methoxyethyl) ether (diglyme) EC No: 203-924-4 CAS No: 111-96-6 Toxic for reproduction (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 26. 1,2-dichloroethane (EDC) EC No: 203-458-1 CAS No: 107-06-2 Carcinogenic (category 1B) 22 May 2016 22 November 2017 — 27. 2,2′-dichloro-4,4′-methylenedianiline (MOCA) EC No: 202-918-9 CAS No: 101-14-4 Carcinogenic (category 1B) 22 May 2016 ►M43 (*1) ◄ 22 November 2017 ►M43 (*2) ◄ — 28. Dichromium tris(chromate) EC No: 246-356-2 CAS No: 24613-89-6 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 29. Strontium chromate EC No: 232-142-6 CAS No: 7789-06-2 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 30. Potassium hydroxyoctaoxodizincatedichromate EC',
'(c)\n\nthe financial soundness of the proposed acquirer, in particular in relation to the type of business pursued and envisaged in the investment firm in which the acquisition is proposed;\n\n(d)\n\nwhether the investment firm will be able to comply and continue to comply with the prudential requirements based on this Directive and, where applicable, other Directives, in particular Directives 2002/87/EC and 2013/36/EU, in particular, whether the group of which it will become a part has a structure that makes it possible to exercise effective supervision, effectively exchange information among the competent authorities and determine the allocation of responsibilities among the competent authorities;\n\n(e)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.6777 |
| cosine_accuracy@3 | 0.8973 |
| cosine_accuracy@5 | 0.9391 |
| cosine_accuracy@10 | 0.9691 |
| cosine_precision@1 | 0.6777 |
| cosine_precision@3 | 0.2991 |
| cosine_precision@5 | 0.1878 |
| cosine_precision@10 | 0.0969 |
| cosine_recall@1 | 0.6777 |
| cosine_recall@3 | 0.8973 |
| cosine_recall@5 | 0.9391 |
| cosine_recall@10 | 0.9691 |
| cosine_ndcg@10 | 0.8364 |
| cosine_mrr@10 | 0.7924 |
| cosine_map@100 | 0.7938 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 46,338 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 11 tokens
- mean: 35.09 tokens
- max: 214 tokens
- min: 4 tokens
- mean: 202.2 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 How do the Academies support education and training providers in maintaining and ensuring the quality of the training offered?to in Chapter IV of this Regulation; (b) promoting the voluntary use of the learning programmes, content and materials by education and training providers in the Member States; --- --- (c) offering support to the education and training providers that use the learning programmes, content and materials produced by the Academies to uphold the quality of the training offered and to develop mechanisms to ensure the quality of the training offered; --- --- (d) developing credentials, including, if appropriate, micro-credentials, for voluntary use by Member States and education and training providers on their territories, in order to facilitate the identification of skills and, where appropriate, the recognition of qualifications, to enhance theThe text provides a comprehensive list of various nickel compounds, including their chemical names and associated identifiers. It covers a range of nickel salts, oxides, and other derivatives, highlighting their diverse applications and chemical properties. The compounds mentioned include nickel arsenate, nickel oxalate, and nickel dichromate, among others, indicating their significance in industrial and chemical processes.[5] 235-688-3 [5] 12519-85-6 [5] Dinickel hexacyanoferrate 028-037-00-8 238-946-3 14874-78-3 Trinickel bis(arsenate); Nickel (II) arsenate 028-038-00-3 236-771-7 13477-70-8 Nickel oxalate; [1] 028-039-00-9 208-933-7 [1] 547-67-1 [1] Oxalic acid, nickel salt; [2] 243-867-2 [2] 20543-06-0 [2] Nickel telluride 028-040-00-4 235-260-6 12142-88-0 Trinickel tetrasulfide 028-041-00-X — 12137-12-1 Trinickel bis(arsenite) 028-042-00-5 — 74646-29-0 Cobalt nickel gray periclase; 028-043-00-0 C.I. Pigment Black 25; C.I. 77332; [1] 269-051-6 [1] 68186-89-0 [1] Cobalt nickel dioxide; [2] 261-346-8 [2] 58591-45-0 [2] Cobalt nickel oxide; [3] - [3] 12737-30-3 [3] Nickel tin trioxide; Nickel stannate 028-044-00-6 234-824-9 12035-38-0 Nickel triuranium decaoxide 028-045-00-1 239-876-6 15780-33-3 Nickel dithiocyanate 028-046-00-7 237-205-1 13689-92-4 Nickel dichromate 028-047-00-2 239-646-5 15586-38-6 Nickel (II) selenite 028-048-00-8 233-263-7 10101-96-9 Nickel selenide 028-049-00-3 215-216-2 1314-05-2 S...What is the definition of 'Union airport managing body' and how does it relate to the management of centralized infrastructures for fuel distribution systems?(2)
‘Union airport managing body’ means, in respect of a Union airport, the ‘airport managing body’ as defined in Article 2, point (2), of Directive 2009/12/EC or, where the Member State concerned has reserved the management of the centralised infrastructures for fuel distribution systems for another body pursuant to Article 8(1) of Council Directive 96/67/EC ( 2 ), that other body;
(3)
‘aircraft operator’ means a person that operated at least 500 commercial passenger air transport flights, or 52 commercial all-cargo air transport flights departing from Union airports in the previous reporting period or, where it is not possible for that person to be identified, the owner of the aircraft;
(4) - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 4per_device_eval_batch_size: 4num_train_epochs: 4multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| 0.0432 | 500 | 0.5169 | 0.7365 |
| 0.0863 | 1000 | 0.1341 | 0.7914 |
| 0.1295 | 1500 | 0.0784 | 0.7992 |
| 0.1726 | 2000 | 0.0782 | 0.8058 |
| 0.2158 | 2500 | 0.0596 | 0.8012 |
| 0.2590 | 3000 | 0.057 | 0.8079 |
| 0.3021 | 3500 | 0.0785 | 0.8086 |
| 0.3453 | 4000 | 0.0423 | 0.8010 |
| 0.3884 | 4500 | 0.0586 | 0.8075 |
| 0.4316 | 5000 | 0.0508 | 0.8008 |
| 0.4748 | 5500 | 0.0764 | 0.7934 |
| 0.5179 | 6000 | 0.0583 | 0.8068 |
| 0.5611 | 6500 | 0.0663 | 0.8008 |
| 0.6042 | 7000 | 0.0344 | 0.8083 |
| 0.6474 | 7500 | 0.0506 | 0.8104 |
| 0.6905 | 8000 | 0.0478 | 0.8089 |
| 0.7337 | 8500 | 0.0509 | 0.8034 |
| 0.7769 | 9000 | 0.0426 | 0.8114 |
| 0.8200 | 9500 | 0.0603 | 0.8097 |
| 0.8632 | 10000 | 0.036 | 0.8142 |
| 0.9063 | 10500 | 0.0581 | 0.8081 |
| 0.9495 | 11000 | 0.0351 | 0.8018 |
| 0.9927 | 11500 | 0.0358 | 0.8082 |
| 1.0 | 11585 | - | 0.8076 |
| 1.0358 | 12000 | 0.0398 | 0.8093 |
| 1.0790 | 12500 | 0.0197 | 0.8023 |
| 1.1221 | 13000 | 0.0376 | 0.8137 |
| 1.1653 | 13500 | 0.0287 | 0.8136 |
| 1.2085 | 14000 | 0.0269 | 0.8146 |
| 1.2516 | 14500 | 0.0089 | 0.8161 |
| 1.2948 | 15000 | 0.0149 | 0.8126 |
| 1.3379 | 15500 | 0.0457 | 0.8138 |
| 1.3811 | 16000 | 0.0119 | 0.8171 |
| 1.4243 | 16500 | 0.0107 | 0.8105 |
| 1.4674 | 17000 | 0.015 | 0.8171 |
| 1.5106 | 17500 | 0.0208 | 0.8153 |
| 1.5537 | 18000 | 0.0168 | 0.8111 |
| 1.5969 | 18500 | 0.0114 | 0.8171 |
| 1.6401 | 19000 | 0.0188 | 0.8239 |
| 1.6832 | 19500 | 0.01 | 0.8182 |
| 1.7264 | 20000 | 0.0158 | 0.8125 |
| 1.7695 | 20500 | 0.0155 | 0.8201 |
| 1.8127 | 21000 | 0.0276 | 0.8182 |
| 1.8558 | 21500 | 0.0245 | 0.8123 |
| 1.8990 | 22000 | 0.0135 | 0.8223 |
| 1.9422 | 22500 | 0.0334 | 0.8182 |
| 1.9853 | 23000 | 0.0111 | 0.8200 |
| 2.0 | 23170 | - | 0.8221 |
| 2.0285 | 23500 | 0.0139 | 0.8225 |
| 2.0716 | 24000 | 0.0113 | 0.8237 |
| 2.1148 | 24500 | 0.0072 | 0.8223 |
| 2.1580 | 25000 | 0.0138 | 0.8218 |
| 2.2011 | 25500 | 0.0071 | 0.8200 |
| 2.2443 | 26000 | 0.0091 | 0.8240 |
| 2.2874 | 26500 | 0.013 | 0.8224 |
| 2.3306 | 27000 | 0.008 | 0.8248 |
| 2.3738 | 27500 | 0.0084 | 0.8203 |
| 2.4169 | 28000 | 0.0147 | 0.8255 |
| 2.4601 | 28500 | 0.0067 | 0.8268 |
| 2.5032 | 29000 | 0.0028 | 0.8219 |
| 2.5464 | 29500 | 0.0124 | 0.8234 |
| 2.5896 | 30000 | 0.0051 | 0.8237 |
| 2.6327 | 30500 | 0.0151 | 0.8256 |
| 2.6759 | 31000 | 0.0051 | 0.8207 |
| 2.7190 | 31500 | 0.0086 | 0.8250 |
| 2.7622 | 32000 | 0.0152 | 0.8265 |
| 2.8054 | 32500 | 0.0085 | 0.8297 |
| 2.8485 | 33000 | 0.0097 | 0.8316 |
| 2.8917 | 33500 | 0.0269 | 0.8284 |
| 2.9348 | 34000 | 0.008 | 0.8305 |
| 2.9780 | 34500 | 0.0146 | 0.8309 |
| 3.0 | 34755 | - | 0.8301 |
| 3.0211 | 35000 | 0.0218 | 0.8326 |
| 3.0643 | 35500 | 0.0152 | 0.8301 |
| 3.1075 | 36000 | 0.0072 | 0.8290 |
| 3.1506 | 36500 | 0.0077 | 0.8270 |
| 3.1938 | 37000 | 0.0155 | 0.8299 |
| 3.2369 | 37500 | 0.0069 | 0.8328 |
| 3.2801 | 38000 | 0.0103 | 0.8364 |
Framework Versions
- Python: 3.10.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.1
- PyTorch: 2.4.0+cu121
- Accelerate: 1.4.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}