snoweu_v4 / README.md
fjavigv's picture
Upload 12 files
0e47369 verified
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:46338
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-m-v1.5
widget:
- source_sentence: What are the chemical names and corresponding identifiers for octabromo
derivate and 2-Methoxyethanol, including their CAS numbers and EC numbers?
sentences:
- 'octabromo derivate 602-094-00-4 251-087-9 32536-52-0 2-Methoxyethanol; ethylene
glycol monomethyl ether; methylglycol 603-011-00-4 203-713-7 109-86-4 2-Ethoxyethanol;
ethylene glycol monoethyl ether; ethylglycol 603-012-00-X 203-804-1 110-80-5 [▼M61](./../../../legal-content/EN/AUTO/?uri=celex:32020R2096
"32020R2096: INSERTED") Ethylene oxide; oxirane 603-023-00-X 200-849-9 75-21-8
[▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29 "32006R1907R(01):
REPLACED") 1,2-Dimethoxyethane ethylene glycol dimethyl ether EGDME 603-031-00-3
203-794-9 110-71-4 [▼M45](./../../../legal-content/EN/AUTO/?uri=celex:32017R1510
"32017R1510: INSERTED") Tetrahydro-2-furyl-methanol; tetrahydrofurfuryl alcohol
603-061-00-7 202-625-6 97-99-4'
- hydrocarbons produced as the residual fraction from the distillation of heavy
coker gas oil and vacuum gas oil. It predominantly consists of hydrocarbons having
carbon numbers predominantly greater than C13 and boiling above approximately
230 °C.) 649-026-00-X 270-796-4 68478-17-1 Residues (petroleum), heavy coker and
light vacuum; Heavy fuel oil (A complex combination of hydrocarbons produced as
the residual fraction from the distillation of heavy coker gas oil and light vacuum
gas oil. It consists predominantly of hydrocarbons having carbon numbers predominantly
greater than C13 and boiling above approximately 230 °C.) 649-027-00-5 270-983-0
68512-61-8 Residues (petroleum), light vacuum; Heavy fuel oil (A complex residuum
from the vacuum distillation of the residuum from the atmospheric distillation
of crude oil. It consists of hydrocarbons having carbon numbers predominantly
greater than C13 and boiling above approximately 230 °C.) 649-028-00-0 270-984-6
68512-62-9 Residues (petroleum), steam-cracked light; Heavy fuel oil (A complex
residuum from the distillation of the products from a steam-cracking process.
It consists predominantly of aromatic and unsaturated hydrocarbons having carbon
numbers greater than C7 and boiling in the range of approximately 101 to 555 °C.)
649-029-00-6 271-013-9 68513-69-9 Fuel oil, No 6; Heavy fuel oil (A distillate
oil having a minimum viscosity of 197 10-6 m2s-1 at 37,7 °C to a maximum of 197
10-5 m2s-1 at 37,7 °C.) 649-030-00-1 271-384-7 68553-00-4 Residues (petroleum),
topping plant, low-sulfur; Heavy fuel oil (A low-sulfur complex combination of
hydrocarbons produced as the residual fraction from the topping plant distillation
of crude oil. It is the residuum after the straight-run gasoline cut, kerosene
cut and gas oil cut have been removed.) 649-031-00-7 271-763-7 68607-30-7 Gas
oils (petroleum), heavy atmospheric; Heavy fuel oil (A complex combination of
hydrocarbons obtained by the distillation of crude oil. It consists of hydrocarbons
having carbon numbers predominantly in the range of C7 through C35 and boiling
in the range of approximately 121 to 510 °C.) 649-032-00-2 272-184-2 68783-08-4
Residues (petroleum), coker scrubber, Condensed-ring-arom.-contg.; Heavy fuel
- '(e)
where applicable, how the undertaking assesses the effectiveness of its engagement
with its own workforce, including, where relevant, any agreements or outcomes
that result.
Where applicable, the undertaking shall disclose the steps it takes to gain insight
into the perspectives of people in its own workforce who may be particularly vulnerable
to impacts and/or marginalised (for example, women, migrants, people with disabilities).
If the undertaking cannot disclose the above required information because it has
not adopted a general process to engage with its own workforce , it shall disclose
this to be the case. It may disclose a timeframe in which it aims to have such
a process in place.'
- source_sentence: Under what circumstances can the suspension or removal of a financial
instrument or derivative from trading be exempted, despite infringing Articles
7 and 17 of Regulation (EU) No 596/2014?
sentences:
- '(15) Directive 2010/75/EU of the European Parliament and of the Council of 24
November 2010 on industrial emissions (integrated pollution prevention and control)
(recast) (OJ L 334, 17.12.2010, p. 17).
(16) Directive 2011/92/EU of the European Parliament and of the Council of 13
December 2011 on the assessment of the effects of certain public and private projects
on the environment (OJ L 26, 28.1.2012, p. 1).
(17) Directive 2012/18/EU of the European Parliament and of the Council of 4 July
2012 on the control of major-accident hazards involving dangerous substances,
amending and subsequently repealing Council Directive 96/82/EC (OJ L 197, 24.7.2012,
p. 1).'
- '3.
Where the competent authority of the host Member State of a regulated market,
an MTF or OTF has clear and demonstrable grounds for believing that such regulated
market, MTF or OTF infringes the obligations arising from the provisions adopted
pursuant to this Directive, it shall refer those findings to the competent authority
of the home Member State of the regulated market or the MTF or OTF.'
- The notified competent authorities of the other Member States shall require that
regulated markets, other MTFs, other OTFs and systematic internalisers, which
fall under their jurisdiction and trade the same financial instrument or derivatives
referred to in points (4) to (10) of Section C of Annex I that relate or are referenced
to that financial instrument, also suspend or remove that financial instrument
or derivatives from trading, where the suspension or removal is due to suspected
market abuse, a take-over bid or the non- disclosure of inside information about
the issuer or financial instrument infringing Articles 7 and 17 of Regulation
(EU) No 596/2014 except where such suspension or removal could cause significant
damage to the
- source_sentence: How can the limitation period for the Commission's powers be interrupted
according to Article 38?
sentences:
- '2.
That third-country dialogue shall not prevent the Commission from taking action
under this Regulation. Individual measures adopted pursuant to this Regulation
shall not be addressed within that dialogue.
Article 38
Limitation periods
1.
The powers of the Commission under Articles 10 and 11 shall be subject to a limitation
period of 10 years, starting on the day on which a foreign subsidy is granted
to an undertaking. Any action taken by the Commission under Article 10, 13, 14
or 15 with respect to a foreign subsidy shall interrupt the limitation period.
After each interruption, the limitation period of 10 years shall start to run
afresh.
2.'
- (36) Member States should promote energy efficient means of mobility, including
in their public procurement practices, such as rail, cycling, walking or shared
mobility, by renewing and decarbonising fleets, encouraging a modal shift and
including those modes in urban mobility planning.
- air oxidation of petrolatum.) 649-255-00-5 265-206-7 64743-01-7 N Petrolatum (petroleum),
alumina-treated; Petrolatum (A complex combination of hydrocarbons obtained when
petrolatum is treated with Al2O3 to remove polar components and impurities. It
consists predominantly of saturated, crystalline, and liquid hydrocarbons having
carbon numbers predominantly greater than C25.) 649-256-00-0 285-098-5 85029-74-9
N Petrolatum (petroleum), hydrotreated; Petrolatum (A complex combination of hydrocarbons
obtained as a semi-solid from dewaxed paraffinic residual oil treated with hydrogen
in the presence of a catalyst. It consists predominantly of saturated, microcrystalline,
and liquid hydrocarbons having carbon numbers predominantly greater than
- source_sentence: What specific sections and points of Annex VIII are included in
the registration for high-risk AI systems in the areas of law enforcement, migration,
asylum, and border control management?
sentences:
- '▼M15
Article 18b
Assistance from the Commission, EMSA and other relevant organisations
1.
For the purposes of carrying out its obligations under Article 3c(4) and Articles
3g, 3gd, 3ge, 3gf, 3gg and 18a, the Commission, the administering Member State
and administering authorities in respect of a shipping company may request the
assistance of EMSA or another relevant organisation and may conclude to that effect
any appropriate agreements with those organisations.
2.
The Commission, assisted by EMSA, shall endeavour to develop appropriate tools
and guidance to facilitate and coordinate verification and enforcement activities
related to the application of this Directive to maritime transport. As far as
practicable, such guidance and tools shall be made available to the Member States
and the verifiers for information-sharing purposes and in order to better ensure
robust enforcement of the national measures transposing this Directive.
▼B
Article 19
Registries
▼M4
1.
Allowances issued from 1 January 2012 onwards shall be held in the ►M9 Union ◄
registry for the execution of processes pertaining to the maintenance of the holding
accounts opened in the Member State and the allocation, surrender and cancellation
of allowances under the Commission ►M9 Acts ◄ referred to in paragraph 3.
Each Member State shall be able to fulfil the execution of authorised operations
under the UNFCCC or the Kyoto Protocol.
▼B
2.
Any person may hold allowances. The registry shall be accessible to the public
and shall contain separate accounts to record the allowances held by each person
to whom and from whom allowances are issued or transferred.
▼M9
3.'
- '(35)
‘recycled carbon fuels’ means liquid and gaseous fuels that are produced from
liquid or solid waste streams of non-renewable origin which are not suitable for
material recovery in accordance with Article 4 of Directive 2008/98/EC, or from
waste processing gas and exhaust gas of non-renewable origin which are produced
as an unavoidable and unintentional consequence of the production process in industrial
installations;
▼M2
(36)
‘renewable fuels of non-biological origin’ means liquid and gaseous fuels the
energy content of which is derived from renewable sources other than biomass;
▼B
(37)'
- '4. For high-risk AI systems referred to in points 1, 6 and 7 of Annex III, in
the areas of law enforcement, migration, asylum and border control management,
the registration referred to in paragraphs 1, 2 and 3 of this Article shall be
in a secure non-public section of the EU database referred to in Article 71 and
shall include only the following information, as applicable, referred to in:
(a) Section A, points 1 to 10, of Annex VIII, with the exception of points 6,
8 and 9; (b) Section B, points 1 to 5, and points 8 and 9 of Annex VIII; --- ---
(c) Section C, points 1 to 3, of Annex VIII; --- --- (d) points 1, 2, 3 and 5,
of Annex IX. --- ---'
- source_sentence: The document outlines various chemical substances classified as
carcinogenic or toxic for reproduction, detailing their respective categories
and regulatory dates. Specific compounds such as diarsenic trioxide, lead chromate,
and chromium trioxide are highlighted, indicating their potential health risks
and the timeline for their regulation.
sentences:
- '57(f) – human health) (a) 21 August 2013 (*) (b) By way of derogation from point
(a): 14 June 2023 for uses in mixtures containing DIBP at or above 0,1 % and below
0,3 % weight by weight. (a) 21 February 2015 (**) (b) By way of derogation from
point (a): 14 December 2024 for uses in mixtures containing DIBP at or above 0,1
% and below 0,3 % weight by weight. - [▼M15](./../../../legal-content/EN/AUTO/?uri=celex:32012R0125
"32012R0125: INSERTED") 8. Diarsenic trioxide EC No: 215-481-4 CAS No: 1327-53-3
Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 9. Diarsenic pentaoxide
EC No: 215-116-9 CAS No: 1303-28-2 Carcinogenic (category 1A) 21 November 2013
21 May 2015 — 10. Lead chromate EC No: 231-846-0 CAS No: 7758-97-6 Carcinogenic
(category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1)
◄ 21 May 2015 ►M43 (*2) ◄ — 11. Lead sulfochromate yellow (C.I. Pigment Yellow
34) EC No: 215-693-7 CAS No: 1344-37-2 Carcinogenic (category 1B) Toxic for reproduction
(category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 12. Lead
chromate molybdate sulphate red (C.I. Pigment Red 104) EC No: 235-759-9 CAS No:
12656-85-8 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21
November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ 13. Tris (2-chloroethyl) phosphate
(TCEP) EC No: 204-118-5 CAS No: 115-96-8 Toxic for reproduction (category 1B)
21 February 2014 21 August 2015 14. 2,4-Dinitrotoluene (2,4-DNT) EC No: 204-450-0
CAS No: 121-14-2 Carcinogenic (category 1B) 21 February 2014 ►M43 (*1) ◄ 21 August
2015 ►M43 (*2) ◄ [▼M22](./../../../legal-content/EN/AUTO/?uri=celex:32013R0348
"32013R0348: INSERTED") 15. Trichloroethylene EC No: 201-167-4 CAS No: 79-01-6
Carcinogenic (category 1B) 21 October 2014 ►M43 (*1) ◄ 21 April 2016 ►M43 (*2)
◄ — 16. Chromium trioxide EC No: 215-607-8 CAS No: 1333-82-0 Carcinogenic (category
1A) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2)
◄ — 17. Acids generated from chromium trioxide and their oligomers Group containing:
Chromic acid EC No: 231-801-5 CAS No: 7738-94-5 Dichromic acid EC No: 236-881-5
CAS No: 13530-68-2 Oligomers of chromic acid and dichromic acid EC No: not yet
assigned CAS No: not yet assigned Carcinogenic (category 1B) 21 March 2016 ►M43
(*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 18. Sodium dichromate EC No: 234-190-3
CAS No: 7789-12-0 10588-01-9 Carcinogenic (category 1B) Mutagenic (category 1B)
Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017
►M43 (*2) ◄ — 19. Potassium dichromate EC No: 231-906-6 CAS No: 7778-50-9 Carcinogenic
(category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21
March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 20. Ammonium dichromate
EC No: 232-143-1 CAS No: 7789-09-5 Carcinogenic (category 1B) Mutagenic (category
1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September
2017 ►M43 (*2) ◄ 21. Potassium chromate EC No: 232-140-5 CAS No: 7789-00-6 Carcinogenic
(category 1B) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017
►M43 (*2) ◄ 22. Sodium chromate EC No: 231-889-5 CAS No: 7775-11-3 Carcinogenic
(category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21
March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ [▼M28](./../../../legal-content/EN/AUTO/?uri=celex:32014R0895
"32014R0895: INSERTED") 23. Formaldehyde, oligomeric reaction products with aniline
(technical MDA) EC No: 500-036-1 CAS No: 25214-70-4 Carcinogenic (category 1B)
22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 24. Arsenic acid EC
No: 231-901-9 CAS No: 7778-39-4 Carcinogenic (category 1A) 22 February 2016 22
August 2017 — 25. Bis(2-methoxyethyl) ether (diglyme) EC No: 203-924-4 CAS No:
111-96-6 Toxic for reproduction (category 1B) 22 February 2016 ►M43 (*1) ◄ 22
August 2017 ►M43 (*2) ◄ — 26. 1,2-dichloroethane (EDC) EC No: 203-458-1 CAS No:
107-06-2 Carcinogenic (category 1B) 22 May 2016 22 November 2017 — 27. 2,2′-dichloro-4,4′-methylenedianiline
(MOCA) EC No: 202-918-9 CAS No: 101-14-4 Carcinogenic (category 1B) 22 May 2016
►M43 (*1) ◄ 22 November 2017 ►M43 (*2) ◄ — 28. Dichromium tris(chromate) EC No:
246-356-2 CAS No: 24613-89-6 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1)
◄ 22 January 2019 ►M43 (*2) ◄ — 29. Strontium chromate EC No: 232-142-6 CAS No:
7789-06-2 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019
►M43 (*2) ◄ — 30. Potassium hydroxyoctaoxodizincatedichromate EC'
- '(c)
the financial soundness of the proposed acquirer, in particular in relation to
the type of business pursued and envisaged in the investment firm in which the
acquisition is proposed;
(d)
whether the investment firm will be able to comply and continue to comply with
the prudential requirements based on this Directive and, where applicable, other
Directives, in particular Directives 2002/87/EC and 2013/36/EU, in particular,
whether the group of which it will become a part has a structure that makes it
possible to exercise effective supervision, effectively exchange information among
the competent authorities and determine the allocation of responsibilities among
the competent authorities;
(e)'
- No administrative costs or fees related to the implementation of financing and
investment operations under the EU guarantee shall be due to the implementing
partner by the Commission unless the nature of the policy objectives targeted
by the financial product to be implemented and the affordability for the targeted
final recipients or the type of financing provided allow the implementing partner
to duly justify to the Commission the need for an exception. The coverage of such
costs by the Union budget shall be limited to the amount strictly required to
implement the relevant financing and investment operations, and shall be provided
only to the extent to which the costs are not covered by revenues received by
the implementing partners from
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.6777144829967202
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8972898325565337
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9390643880545486
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9691006387018816
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6777144829967202
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2990966108521779
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18781287761090967
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09691006387018813
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6777144829967202
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8972898325565337
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9390643880545486
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9691006387018816
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8364282304724784
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7924261355385132
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7938274567816883
name: Cosine Map@100
---
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) <!-- at revision 8e4eaca09c27ad3d501908636ec7c8bc3561b6de -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'The document outlines various chemical substances classified as carcinogenic or toxic for reproduction, detailing their respective categories and regulatory dates. Specific compounds such as diarsenic trioxide, lead chromate, and chromium trioxide are highlighted, indicating their potential health risks and the timeline for their regulation.',
'57(f) – human health) (a) 21 August 2013 (*) (b) By way of derogation from point (a): 14 June 2023 for uses in mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by weight. (a) 21 February 2015 (**) (b) By way of derogation from point (a): 14 December 2024 for uses in mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by weight. - [▼M15](./../../../legal-content/EN/AUTO/?uri=celex:32012R0125 "32012R0125: INSERTED") 8. Diarsenic trioxide EC No: 215-481-4 CAS No: 1327-53-3 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 9. Diarsenic pentaoxide EC No: 215-116-9 CAS No: 1303-28-2 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 10. Lead chromate EC No: 231-846-0 CAS No: 7758-97-6 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 11. Lead sulfochromate yellow (C.I. Pigment Yellow 34) EC No: 215-693-7 CAS No: 1344-37-2 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 12. Lead chromate molybdate sulphate red (C.I. Pigment Red 104) EC No: 235-759-9 CAS No: 12656-85-8 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ 13. Tris (2-chloroethyl) phosphate (TCEP) EC No: 204-118-5 CAS No: 115-96-8 Toxic for reproduction (category 1B) 21 February 2014 21 August 2015 14. 2,4-Dinitrotoluene (2,4-DNT) EC No: 204-450-0 CAS No: 121-14-2 Carcinogenic (category 1B) 21 February 2014 ►M43 (*1) ◄ 21 August 2015 ►M43 (*2) ◄ [▼M22](./../../../legal-content/EN/AUTO/?uri=celex:32013R0348 "32013R0348: INSERTED") 15. Trichloroethylene EC No: 201-167-4 CAS No: 79-01-6 Carcinogenic (category 1B) 21 October 2014 ►M43 (*1) ◄ 21 April 2016 ►M43 (*2) ◄ — 16. Chromium trioxide EC No: 215-607-8 CAS No: 1333-82-0 Carcinogenic (category 1A) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 17. Acids generated from chromium trioxide and their oligomers Group containing: Chromic acid EC No: 231-801-5 CAS No: 7738-94-5 Dichromic acid EC No: 236-881-5 CAS No: 13530-68-2 Oligomers of chromic acid and dichromic acid EC No: not yet assigned CAS No: not yet assigned Carcinogenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 18. Sodium dichromate EC No: 234-190-3 CAS No: 7789-12-0 10588-01-9 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 19. Potassium dichromate EC No: 231-906-6 CAS No: 7778-50-9 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 20. Ammonium dichromate EC No: 232-143-1 CAS No: 7789-09-5 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 21. Potassium chromate EC No: 232-140-5 CAS No: 7789-00-6 Carcinogenic (category 1B) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 22. Sodium chromate EC No: 231-889-5 CAS No: 7775-11-3 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ [▼M28](./../../../legal-content/EN/AUTO/?uri=celex:32014R0895 "32014R0895: INSERTED") 23. Formaldehyde, oligomeric reaction products with aniline (technical MDA) EC No: 500-036-1 CAS No: 25214-70-4 Carcinogenic (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 24. Arsenic acid EC No: 231-901-9 CAS No: 7778-39-4 Carcinogenic (category 1A) 22 February 2016 22 August 2017 — 25. Bis(2-methoxyethyl) ether (diglyme) EC No: 203-924-4 CAS No: 111-96-6 Toxic for reproduction (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 26. 1,2-dichloroethane (EDC) EC No: 203-458-1 CAS No: 107-06-2 Carcinogenic (category 1B) 22 May 2016 22 November 2017 — 27. 2,2′-dichloro-4,4′-methylenedianiline (MOCA) EC No: 202-918-9 CAS No: 101-14-4 Carcinogenic (category 1B) 22 May 2016 ►M43 (*1) ◄ 22 November 2017 ►M43 (*2) ◄ — 28. Dichromium tris(chromate) EC No: 246-356-2 CAS No: 24613-89-6 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 29. Strontium chromate EC No: 232-142-6 CAS No: 7789-06-2 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 30. Potassium hydroxyoctaoxodizincatedichromate EC',
'(c)\n\nthe financial soundness of the proposed acquirer, in particular in relation to the type of business pursued and envisaged in the investment firm in which the acquisition is proposed;\n\n(d)\n\nwhether the investment firm will be able to comply and continue to comply with the prudential requirements based on this Directive and, where applicable, other Directives, in particular Directives 2002/87/EC and 2013/36/EU, in particular, whether the group of which it will become a part has a structure that makes it possible to exercise effective supervision, effectively exchange information among the competent authorities and determine the allocation of responsibilities among the competent authorities;\n\n(e)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
## Evaluation
### Metrics
#### Information Retrieval
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.6777 |
| cosine_accuracy@3 | 0.8973 |
| cosine_accuracy@5 | 0.9391 |
| cosine_accuracy@10 | 0.9691 |
| cosine_precision@1 | 0.6777 |
| cosine_precision@3 | 0.2991 |
| cosine_precision@5 | 0.1878 |
| cosine_precision@10 | 0.0969 |
| cosine_recall@1 | 0.6777 |
| cosine_recall@3 | 0.8973 |
| cosine_recall@5 | 0.9391 |
| cosine_recall@10 | 0.9691 |
| **cosine_ndcg@10** | **0.8364** |
| cosine_mrr@10 | 0.7924 |
| cosine_map@100 | 0.7938 |
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 46,338 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 |
|:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details | <ul><li>min: 11 tokens</li><li>mean: 35.09 tokens</li><li>max: 214 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 202.2 tokens</li><li>max: 512 tokens</li></ul> |
* Samples:
| sentence_0 | sentence_1 |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>How do the Academies support education and training providers in maintaining and ensuring the quality of the training offered?</code> | <code>to in Chapter IV of this Regulation; (b) promoting the voluntary use of the learning programmes, content and materials by education and training providers in the Member States; --- --- (c) offering support to the education and training providers that use the learning programmes, content and materials produced by the Academies to uphold the quality of the training offered and to develop mechanisms to ensure the quality of the training offered; --- --- (d) developing credentials, including, if appropriate, micro-credentials, for voluntary use by Member States and education and training providers on their territories, in order to facilitate the identification of skills and, where appropriate, the recognition of qualifications, to enhance the</code> |
| <code>The text provides a comprehensive list of various nickel compounds, including their chemical names and associated identifiers. It covers a range of nickel salts, oxides, and other derivatives, highlighting their diverse applications and chemical properties. The compounds mentioned include nickel arsenate, nickel oxalate, and nickel dichromate, among others, indicating their significance in industrial and chemical processes.</code> | <code>[5] 235-688-3 [5] 12519-85-6 [5] Dinickel hexacyanoferrate 028-037-00-8 238-946-3 14874-78-3 Trinickel bis(arsenate); Nickel (II) arsenate 028-038-00-3 236-771-7 13477-70-8 Nickel oxalate; [1] 028-039-00-9 208-933-7 [1] 547-67-1 [1] Oxalic acid, nickel salt; [2] 243-867-2 [2] 20543-06-0 [2] Nickel telluride 028-040-00-4 235-260-6 12142-88-0 Trinickel tetrasulfide 028-041-00-X — 12137-12-1 Trinickel bis(arsenite) 028-042-00-5 — 74646-29-0 Cobalt nickel gray periclase; 028-043-00-0 C.I. Pigment Black 25; C.I. 77332; [1] 269-051-6 [1] 68186-89-0 [1] Cobalt nickel dioxide; [2] 261-346-8 [2] 58591-45-0 [2] Cobalt nickel oxide; [3] - [3] 12737-30-3 [3] Nickel tin trioxide; Nickel stannate 028-044-00-6 234-824-9 12035-38-0 Nickel triuranium decaoxide 028-045-00-1 239-876-6 15780-33-3 Nickel dithiocyanate 028-046-00-7 237-205-1 13689-92-4 Nickel dichromate 028-047-00-2 239-646-5 15586-38-6 Nickel (II) selenite 028-048-00-8 233-263-7 10101-96-9 Nickel selenide 028-049-00-3 215-216-2 1314-05-2 S...</code> |
| <code>What is the definition of 'Union airport managing body' and how does it relate to the management of centralized infrastructures for fuel distribution systems?</code> | <code>(2)<br><br>‘Union airport managing body’ means, in respect of a Union airport, the ‘airport managing body’ as defined in Article 2, point (2), of Directive 2009/12/EC or, where the Member State concerned has reserved the management of the centralised infrastructures for fuel distribution systems for another body pursuant to Article 8(1) of Council Directive 96/67/EC ( 2 ), that other body;<br><br>(3)<br><br>‘aircraft operator’ means a person that operated at least 500 commercial passenger air transport flights, or 52 commercial all-cargo air transport flights departing from Union airports in the previous reporting period or, where it is not possible for that person to be identified, the owner of the aircraft;<br><br>(4)</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 4
- `num_train_epochs`: 4
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 4
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin
</details>
### Training Logs
| Epoch | Step | Training Loss | cosine_ndcg@10 |
|:------:|:-----:|:-------------:|:--------------:|
| 0.0432 | 500 | 0.5169 | 0.7365 |
| 0.0863 | 1000 | 0.1341 | 0.7914 |
| 0.1295 | 1500 | 0.0784 | 0.7992 |
| 0.1726 | 2000 | 0.0782 | 0.8058 |
| 0.2158 | 2500 | 0.0596 | 0.8012 |
| 0.2590 | 3000 | 0.057 | 0.8079 |
| 0.3021 | 3500 | 0.0785 | 0.8086 |
| 0.3453 | 4000 | 0.0423 | 0.8010 |
| 0.3884 | 4500 | 0.0586 | 0.8075 |
| 0.4316 | 5000 | 0.0508 | 0.8008 |
| 0.4748 | 5500 | 0.0764 | 0.7934 |
| 0.5179 | 6000 | 0.0583 | 0.8068 |
| 0.5611 | 6500 | 0.0663 | 0.8008 |
| 0.6042 | 7000 | 0.0344 | 0.8083 |
| 0.6474 | 7500 | 0.0506 | 0.8104 |
| 0.6905 | 8000 | 0.0478 | 0.8089 |
| 0.7337 | 8500 | 0.0509 | 0.8034 |
| 0.7769 | 9000 | 0.0426 | 0.8114 |
| 0.8200 | 9500 | 0.0603 | 0.8097 |
| 0.8632 | 10000 | 0.036 | 0.8142 |
| 0.9063 | 10500 | 0.0581 | 0.8081 |
| 0.9495 | 11000 | 0.0351 | 0.8018 |
| 0.9927 | 11500 | 0.0358 | 0.8082 |
| 1.0 | 11585 | - | 0.8076 |
| 1.0358 | 12000 | 0.0398 | 0.8093 |
| 1.0790 | 12500 | 0.0197 | 0.8023 |
| 1.1221 | 13000 | 0.0376 | 0.8137 |
| 1.1653 | 13500 | 0.0287 | 0.8136 |
| 1.2085 | 14000 | 0.0269 | 0.8146 |
| 1.2516 | 14500 | 0.0089 | 0.8161 |
| 1.2948 | 15000 | 0.0149 | 0.8126 |
| 1.3379 | 15500 | 0.0457 | 0.8138 |
| 1.3811 | 16000 | 0.0119 | 0.8171 |
| 1.4243 | 16500 | 0.0107 | 0.8105 |
| 1.4674 | 17000 | 0.015 | 0.8171 |
| 1.5106 | 17500 | 0.0208 | 0.8153 |
| 1.5537 | 18000 | 0.0168 | 0.8111 |
| 1.5969 | 18500 | 0.0114 | 0.8171 |
| 1.6401 | 19000 | 0.0188 | 0.8239 |
| 1.6832 | 19500 | 0.01 | 0.8182 |
| 1.7264 | 20000 | 0.0158 | 0.8125 |
| 1.7695 | 20500 | 0.0155 | 0.8201 |
| 1.8127 | 21000 | 0.0276 | 0.8182 |
| 1.8558 | 21500 | 0.0245 | 0.8123 |
| 1.8990 | 22000 | 0.0135 | 0.8223 |
| 1.9422 | 22500 | 0.0334 | 0.8182 |
| 1.9853 | 23000 | 0.0111 | 0.8200 |
| 2.0 | 23170 | - | 0.8221 |
| 2.0285 | 23500 | 0.0139 | 0.8225 |
| 2.0716 | 24000 | 0.0113 | 0.8237 |
| 2.1148 | 24500 | 0.0072 | 0.8223 |
| 2.1580 | 25000 | 0.0138 | 0.8218 |
| 2.2011 | 25500 | 0.0071 | 0.8200 |
| 2.2443 | 26000 | 0.0091 | 0.8240 |
| 2.2874 | 26500 | 0.013 | 0.8224 |
| 2.3306 | 27000 | 0.008 | 0.8248 |
| 2.3738 | 27500 | 0.0084 | 0.8203 |
| 2.4169 | 28000 | 0.0147 | 0.8255 |
| 2.4601 | 28500 | 0.0067 | 0.8268 |
| 2.5032 | 29000 | 0.0028 | 0.8219 |
| 2.5464 | 29500 | 0.0124 | 0.8234 |
| 2.5896 | 30000 | 0.0051 | 0.8237 |
| 2.6327 | 30500 | 0.0151 | 0.8256 |
| 2.6759 | 31000 | 0.0051 | 0.8207 |
| 2.7190 | 31500 | 0.0086 | 0.8250 |
| 2.7622 | 32000 | 0.0152 | 0.8265 |
| 2.8054 | 32500 | 0.0085 | 0.8297 |
| 2.8485 | 33000 | 0.0097 | 0.8316 |
| 2.8917 | 33500 | 0.0269 | 0.8284 |
| 2.9348 | 34000 | 0.008 | 0.8305 |
| 2.9780 | 34500 | 0.0146 | 0.8309 |
| 3.0 | 34755 | - | 0.8301 |
| 3.0211 | 35000 | 0.0218 | 0.8326 |
| 3.0643 | 35500 | 0.0152 | 0.8301 |
| 3.1075 | 36000 | 0.0072 | 0.8290 |
| 3.1506 | 36500 | 0.0077 | 0.8270 |
| 3.1938 | 37000 | 0.0155 | 0.8299 |
| 3.2369 | 37500 | 0.0069 | 0.8328 |
| 3.2801 | 38000 | 0.0103 | 0.8364 |
### Framework Versions
- Python: 3.10.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.1
- PyTorch: 2.4.0+cu121
- Accelerate: 1.4.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->