fjavigv commited on
Commit
0e47369
·
verified ·
1 Parent(s): f5818fa

Upload 12 files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,871 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:46338
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-m-v1.5
11
+ widget:
12
+ - source_sentence: What are the chemical names and corresponding identifiers for octabromo
13
+ derivate and 2-Methoxyethanol, including their CAS numbers and EC numbers?
14
+ sentences:
15
+ - 'octabromo derivate 602-094-00-4 251-087-9 32536-52-0 2-Methoxyethanol; ethylene
16
+ glycol monomethyl ether; methylglycol 603-011-00-4 203-713-7 109-86-4 2-Ethoxyethanol;
17
+ ethylene glycol monoethyl ether; ethylglycol 603-012-00-X 203-804-1 110-80-5 [▼M61](./../../../legal-content/EN/AUTO/?uri=celex:32020R2096
18
+ "32020R2096: INSERTED") Ethylene oxide; oxirane 603-023-00-X 200-849-9 75-21-8
19
+ [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29 "32006R1907R(01):
20
+ REPLACED") 1,2-Dimethoxyethane ethylene glycol dimethyl ether EGDME 603-031-00-3
21
+ 203-794-9 110-71-4 [▼M45](./../../../legal-content/EN/AUTO/?uri=celex:32017R1510
22
+ "32017R1510: INSERTED") Tetrahydro-2-furyl-methanol; tetrahydrofurfuryl alcohol
23
+ 603-061-00-7 202-625-6 97-99-4'
24
+ - hydrocarbons produced as the residual fraction from the distillation of heavy
25
+ coker gas oil and vacuum gas oil. It predominantly consists of hydrocarbons having
26
+ carbon numbers predominantly greater than C13 and boiling above approximately
27
+ 230 °C.) 649-026-00-X 270-796-4 68478-17-1 Residues (petroleum), heavy coker and
28
+ light vacuum; Heavy fuel oil (A complex combination of hydrocarbons produced as
29
+ the residual fraction from the distillation of heavy coker gas oil and light vacuum
30
+ gas oil. It consists predominantly of hydrocarbons having carbon numbers predominantly
31
+ greater than C13 and boiling above approximately 230 °C.) 649-027-00-5 270-983-0
32
+ 68512-61-8 Residues (petroleum), light vacuum; Heavy fuel oil (A complex residuum
33
+ from the vacuum distillation of the residuum from the atmospheric distillation
34
+ of crude oil. It consists of hydrocarbons having carbon numbers predominantly
35
+ greater than C13 and boiling above approximately 230 °C.) 649-028-00-0 270-984-6
36
+ 68512-62-9 Residues (petroleum), steam-cracked light; Heavy fuel oil (A complex
37
+ residuum from the distillation of the products from a steam-cracking process.
38
+ It consists predominantly of aromatic and unsaturated hydrocarbons having carbon
39
+ numbers greater than C7 and boiling in the range of approximately 101 to 555 °C.)
40
+ 649-029-00-6 271-013-9 68513-69-9 Fuel oil, No 6; Heavy fuel oil (A distillate
41
+ oil having a minimum viscosity of 197 10-6 m2s-1 at 37,7 °C to a maximum of 197
42
+ 10-5 m2s-1 at 37,7 °C.) 649-030-00-1 271-384-7 68553-00-4 Residues (petroleum),
43
+ topping plant, low-sulfur; Heavy fuel oil (A low-sulfur complex combination of
44
+ hydrocarbons produced as the residual fraction from the topping plant distillation
45
+ of crude oil. It is the residuum after the straight-run gasoline cut, kerosene
46
+ cut and gas oil cut have been removed.) 649-031-00-7 271-763-7 68607-30-7 Gas
47
+ oils (petroleum), heavy atmospheric; Heavy fuel oil (A complex combination of
48
+ hydrocarbons obtained by the distillation of crude oil. It consists of hydrocarbons
49
+ having carbon numbers predominantly in the range of C7 through C35 and boiling
50
+ in the range of approximately 121 to 510 °C.) 649-032-00-2 272-184-2 68783-08-4
51
+ Residues (petroleum), coker scrubber, Condensed-ring-arom.-contg.; Heavy fuel
52
+ - '(e)
53
+
54
+
55
+ where applicable, how the undertaking assesses the effectiveness of its engagement
56
+ with its own workforce, including, where relevant, any agreements or outcomes
57
+ that result.
58
+
59
+
60
+ Where applicable, the undertaking shall disclose the steps it takes to gain insight
61
+ into the perspectives of people in its own workforce who may be particularly vulnerable
62
+ to impacts and/or marginalised (for example, women, migrants, people with disabilities).
63
+
64
+
65
+ If the undertaking cannot disclose the above required information because it has
66
+ not adopted a general process to engage with its own workforce , it shall disclose
67
+ this to be the case. It may disclose a timeframe in which it aims to have such
68
+ a process in place.'
69
+ - source_sentence: Under what circumstances can the suspension or removal of a financial
70
+ instrument or derivative from trading be exempted, despite infringing Articles
71
+ 7 and 17 of Regulation (EU) No 596/2014?
72
+ sentences:
73
+ - '(15) Directive 2010/75/EU of the European Parliament and of the Council of 24
74
+ November 2010 on industrial emissions (integrated pollution prevention and control)
75
+ (recast) (OJ L 334, 17.12.2010, p. 17).
76
+
77
+
78
+ (16) Directive 2011/92/EU of the European Parliament and of the Council of 13
79
+ December 2011 on the assessment of the effects of certain public and private projects
80
+ on the environment (OJ L 26, 28.1.2012, p. 1).
81
+
82
+
83
+ (17) Directive 2012/18/EU of the European Parliament and of the Council of 4 July
84
+ 2012 on the control of major-accident hazards involving dangerous substances,
85
+ amending and subsequently repealing Council Directive 96/82/EC (OJ L 197, 24.7.2012,
86
+ p. 1).'
87
+ - '3.
88
+
89
+
90
+ Where the competent authority of the host Member State of a regulated market,
91
+ an MTF or OTF has clear and demonstrable grounds for believing that such regulated
92
+ market, MTF or OTF infringes the obligations arising from the provisions adopted
93
+ pursuant to this Directive, it shall refer those findings to the competent authority
94
+ of the home Member State of the regulated market or the MTF or OTF.'
95
+ - The notified competent authorities of the other Member States shall require that
96
+ regulated markets, other MTFs, other OTFs and systematic internalisers, which
97
+ fall under their jurisdiction and trade the same financial instrument or derivatives
98
+ referred to in points (4) to (10) of Section C of Annex I that relate or are referenced
99
+ to that financial instrument, also suspend or remove that financial instrument
100
+ or derivatives from trading, where the suspension or removal is due to suspected
101
+ market abuse, a take-over bid or the non- disclosure of inside information about
102
+ the issuer or financial instrument infringing Articles 7 and 17 of Regulation
103
+ (EU) No 596/2014 except where such suspension or removal could cause significant
104
+ damage to the
105
+ - source_sentence: How can the limitation period for the Commission's powers be interrupted
106
+ according to Article 38?
107
+ sentences:
108
+ - '2.
109
+
110
+
111
+ That third-country dialogue shall not prevent the Commission from taking action
112
+ under this Regulation. Individual measures adopted pursuant to this Regulation
113
+ shall not be addressed within that dialogue.
114
+
115
+
116
+ Article 38
117
+
118
+
119
+ Limitation periods
120
+
121
+
122
+ 1.
123
+
124
+
125
+ The powers of the Commission under Articles 10 and 11 shall be subject to a limitation
126
+ period of 10 years, starting on the day on which a foreign subsidy is granted
127
+ to an undertaking. Any action taken by the Commission under Article 10, 13, 14
128
+ or 15 with respect to a foreign subsidy shall interrupt the limitation period.
129
+ After each interruption, the limitation period of 10 years shall start to run
130
+ afresh.
131
+
132
+
133
+ 2.'
134
+ - (36) Member States should promote energy efficient means of mobility, including
135
+ in their public procurement practices, such as rail, cycling, walking or shared
136
+ mobility, by renewing and decarbonising fleets, encouraging a modal shift and
137
+ including those modes in urban mobility planning.
138
+ - air oxidation of petrolatum.) 649-255-00-5 265-206-7 64743-01-7 N Petrolatum (petroleum),
139
+ alumina-treated; Petrolatum (A complex combination of hydrocarbons obtained when
140
+ petrolatum is treated with Al2O3 to remove polar components and impurities. It
141
+ consists predominantly of saturated, crystalline, and liquid hydrocarbons having
142
+ carbon numbers predominantly greater than C25.) 649-256-00-0 285-098-5 85029-74-9
143
+ N Petrolatum (petroleum), hydrotreated; Petrolatum (A complex combination of hydrocarbons
144
+ obtained as a semi-solid from dewaxed paraffinic residual oil treated with hydrogen
145
+ in the presence of a catalyst. It consists predominantly of saturated, microcrystalline,
146
+ and liquid hydrocarbons having carbon numbers predominantly greater than
147
+ - source_sentence: What specific sections and points of Annex VIII are included in
148
+ the registration for high-risk AI systems in the areas of law enforcement, migration,
149
+ asylum, and border control management?
150
+ sentences:
151
+ - '▼M15
152
+
153
+
154
+ Article 18b
155
+
156
+
157
+ Assistance from the Commission, EMSA and other relevant organisations
158
+
159
+
160
+ 1.
161
+
162
+
163
+ For the purposes of carrying out its obligations under Article 3c(4) and Articles
164
+ 3g, 3gd, 3ge, 3gf, 3gg and 18a, the Commission, the administering Member State
165
+ and administering authorities in respect of a shipping company may request the
166
+ assistance of EMSA or another relevant organisation and may conclude to that effect
167
+ any appropriate agreements with those organisations.
168
+
169
+
170
+ 2.
171
+
172
+
173
+ The Commission, assisted by EMSA, shall endeavour to develop appropriate tools
174
+ and guidance to facilitate and coordinate verification and enforcement activities
175
+ related to the application of this Directive to maritime transport. As far as
176
+ practicable, such guidance and tools shall be made available to the Member States
177
+ and the verifiers for information-sharing purposes and in order to better ensure
178
+ robust enforcement of the national measures transposing this Directive.
179
+
180
+
181
+ ▼B
182
+
183
+
184
+ Article 19
185
+
186
+
187
+ Registries
188
+
189
+
190
+ ▼M4
191
+
192
+
193
+ 1.
194
+
195
+
196
+ Allowances issued from 1 January 2012 onwards shall be held in the ►M9 Union ◄
197
+ registry for the execution of processes pertaining to the maintenance of the holding
198
+ accounts opened in the Member State and the allocation, surrender and cancellation
199
+ of allowances under the Commission ►M9 Acts ◄ referred to in paragraph 3.
200
+
201
+
202
+ Each Member State shall be able to fulfil the execution of authorised operations
203
+ under the UNFCCC or the Kyoto Protocol.
204
+
205
+
206
+ ▼B
207
+
208
+
209
+ 2.
210
+
211
+
212
+ Any person may hold allowances. The registry shall be accessible to the public
213
+ and shall contain separate accounts to record the allowances held by each person
214
+ to whom and from whom allowances are issued or transferred.
215
+
216
+
217
+ ▼M9
218
+
219
+
220
+ 3.'
221
+ - '(35)
222
+
223
+
224
+ ‘recycled carbon fuels’ means liquid and gaseous fuels that are produced from
225
+ liquid or solid waste streams of non-renewable origin which are not suitable for
226
+ material recovery in accordance with Article 4 of Directive 2008/98/EC, or from
227
+ waste processing gas and exhaust gas of non-renewable origin which are produced
228
+ as an unavoidable and unintentional consequence of the production process in industrial
229
+ installations;
230
+
231
+
232
+ ▼M2
233
+
234
+
235
+ (36)
236
+
237
+
238
+ ‘renewable fuels of non-biological origin’ means liquid and gaseous fuels the
239
+ energy content of which is derived from renewable sources other than biomass;
240
+
241
+
242
+ ▼B
243
+
244
+
245
+ (37)'
246
+ - '4. For high-risk AI systems referred to in points 1, 6 and 7 of Annex III, in
247
+ the areas of law enforcement, migration, asylum and border control management,
248
+ the registration referred to in paragraphs 1, 2 and 3 of this Article shall be
249
+ in a secure non-public section of the EU database referred to in Article 71 and
250
+ shall include only the following information, as applicable, referred to in:
251
+
252
+
253
+ (a) Section A, points 1 to 10, of Annex VIII, with the exception of points 6,
254
+ 8 and 9; (b) Section B, points 1 to 5, and points 8 and 9 of Annex VIII; --- ---
255
+ (c) Section C, points 1 to 3, of Annex VIII; --- --- (d) points 1, 2, 3 and 5,
256
+ of Annex IX. --- ---'
257
+ - source_sentence: The document outlines various chemical substances classified as
258
+ carcinogenic or toxic for reproduction, detailing their respective categories
259
+ and regulatory dates. Specific compounds such as diarsenic trioxide, lead chromate,
260
+ and chromium trioxide are highlighted, indicating their potential health risks
261
+ and the timeline for their regulation.
262
+ sentences:
263
+ - '57(f) – human health) (a) 21 August 2013 (*) (b) By way of derogation from point
264
+ (a): 14 June 2023 for uses in mixtures containing DIBP at or above 0,1 % and below
265
+ 0,3 % weight by weight. (a) 21 February 2015 (**) (b) By way of derogation from
266
+ point (a): 14 December 2024 for uses in mixtures containing DIBP at or above 0,1
267
+ % and below 0,3 % weight by weight. - [▼M15](./../../../legal-content/EN/AUTO/?uri=celex:32012R0125
268
+ "32012R0125: INSERTED") 8. Diarsenic trioxide EC No: 215-481-4 CAS No: 1327-53-3
269
+ Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 9. Diarsenic pentaoxide
270
+ EC No: 215-116-9 CAS No: 1303-28-2 Carcinogenic (category 1A) 21 November 2013
271
+ 21 May 2015 — 10. Lead chromate EC No: 231-846-0 CAS No: 7758-97-6 Carcinogenic
272
+ (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1)
273
+ ◄ 21 May 2015 ►M43 (*2) ◄ — 11. Lead sulfochromate yellow (C.I. Pigment Yellow
274
+ 34) EC No: 215-693-7 CAS No: 1344-37-2 Carcinogenic (category 1B) Toxic for reproduction
275
+ (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 12. Lead
276
+ chromate molybdate sulphate red (C.I. Pigment Red 104) EC No: 235-759-9 CAS No:
277
+ 12656-85-8 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21
278
+ November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ 13. Tris (2-chloroethyl) phosphate
279
+ (TCEP) EC No: 204-118-5 CAS No: 115-96-8 Toxic for reproduction (category 1B)
280
+ 21 February 2014 21 August 2015 14. 2,4-Dinitrotoluene (2,4-DNT) EC No: 204-450-0
281
+ CAS No: 121-14-2 Carcinogenic (category 1B) 21 February 2014 ►M43 (*1) ◄ 21 August
282
+ 2015 ►M43 (*2) ◄ [▼M22](./../../../legal-content/EN/AUTO/?uri=celex:32013R0348
283
+ "32013R0348: INSERTED") 15. Trichloroethylene EC No: 201-167-4 CAS No: 79-01-6
284
+ Carcinogenic (category 1B) 21 October 2014 ►M43 (*1) ◄ 21 April 2016 ►M43 (*2)
285
+ ◄ — 16. Chromium trioxide EC No: 215-607-8 CAS No: 1333-82-0 Carcinogenic (category
286
+ 1A) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2)
287
+ ◄ — 17. Acids generated from chromium trioxide and their oligomers Group containing:
288
+ Chromic acid EC No: 231-801-5 CAS No: 7738-94-5 Dichromic acid EC No: 236-881-5
289
+ CAS No: 13530-68-2 Oligomers of chromic acid and dichromic acid EC No: not yet
290
+ assigned CAS No: not yet assigned Carcinogenic (category 1B) 21 March 2016 ►M43
291
+ (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 18. Sodium dichromate EC No: 234-190-3
292
+ CAS No: 7789-12-0 10588-01-9 Carcinogenic (category 1B) Mutagenic (category 1B)
293
+ Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017
294
+ ►M43 (*2) ◄ — 19. Potassium dichromate EC No: 231-906-6 CAS No: 7778-50-9 Carcinogenic
295
+ (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21
296
+ March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 20. Ammonium dichromate
297
+ EC No: 232-143-1 CAS No: 7789-09-5 Carcinogenic (category 1B) Mutagenic (category
298
+ 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September
299
+ 2017 ►M43 (*2) ◄ 21. Potassium chromate EC No: 232-140-5 CAS No: 7789-00-6 Carcinogenic
300
+ (category 1B) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017
301
+ ►M43 (*2) ◄ 22. Sodium chromate EC No: 231-889-5 CAS No: 7775-11-3 Carcinogenic
302
+ (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21
303
+ March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ [▼M28](./../../../legal-content/EN/AUTO/?uri=celex:32014R0895
304
+ "32014R0895: INSERTED") 23. Formaldehyde, oligomeric reaction products with aniline
305
+ (technical MDA) EC No: 500-036-1 CAS No: 25214-70-4 Carcinogenic (category 1B)
306
+ 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 24. Arsenic acid EC
307
+ No: 231-901-9 CAS No: 7778-39-4 Carcinogenic (category 1A) 22 February 2016 22
308
+ August 2017 — 25. Bis(2-methoxyethyl) ether (diglyme) EC No: 203-924-4 CAS No:
309
+ 111-96-6 Toxic for reproduction (category 1B) 22 February 2016 ►M43 (*1) ◄ 22
310
+ August 2017 ►M43 (*2) ◄ — 26. 1,2-dichloroethane (EDC) EC No: 203-458-1 CAS No:
311
+ 107-06-2 Carcinogenic (category 1B) 22 May 2016 22 November 2017 — 27. 2,2′-dichloro-4,4′-methylenedianiline
312
+ (MOCA) EC No: 202-918-9 CAS No: 101-14-4 Carcinogenic (category 1B) 22 May 2016
313
+ ►M43 (*1) ◄ 22 November 2017 ►M43 (*2) ◄ — 28. Dichromium tris(chromate) EC No:
314
+ 246-356-2 CAS No: 24613-89-6 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1)
315
+ ◄ 22 January 2019 ►M43 (*2) ◄ — 29. Strontium chromate EC No: 232-142-6 CAS No:
316
+ 7789-06-2 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019
317
+ ►M43 (*2) ◄ — 30. Potassium hydroxyoctaoxodizincatedichromate EC'
318
+ - '(c)
319
+
320
+
321
+ the financial soundness of the proposed acquirer, in particular in relation to
322
+ the type of business pursued and envisaged in the investment firm in which the
323
+ acquisition is proposed;
324
+
325
+
326
+ (d)
327
+
328
+
329
+ whether the investment firm will be able to comply and continue to comply with
330
+ the prudential requirements based on this Directive and, where applicable, other
331
+ Directives, in particular Directives 2002/87/EC and 2013/36/EU, in particular,
332
+ whether the group of which it will become a part has a structure that makes it
333
+ possible to exercise effective supervision, effectively exchange information among
334
+ the competent authorities and determine the allocation of responsibilities among
335
+ the competent authorities;
336
+
337
+
338
+ (e)'
339
+ - No administrative costs or fees related to the implementation of financing and
340
+ investment operations under the EU guarantee shall be due to the implementing
341
+ partner by the Commission unless the nature of the policy objectives targeted
342
+ by the financial product to be implemented and the affordability for the targeted
343
+ final recipients or the type of financing provided allow the implementing partner
344
+ to duly justify to the Commission the need for an exception. The coverage of such
345
+ costs by the Union budget shall be limited to the amount strictly required to
346
+ implement the relevant financing and investment operations, and shall be provided
347
+ only to the extent to which the costs are not covered by revenues received by
348
+ the implementing partners from
349
+ pipeline_tag: sentence-similarity
350
+ library_name: sentence-transformers
351
+ metrics:
352
+ - cosine_accuracy@1
353
+ - cosine_accuracy@3
354
+ - cosine_accuracy@5
355
+ - cosine_accuracy@10
356
+ - cosine_precision@1
357
+ - cosine_precision@3
358
+ - cosine_precision@5
359
+ - cosine_precision@10
360
+ - cosine_recall@1
361
+ - cosine_recall@3
362
+ - cosine_recall@5
363
+ - cosine_recall@10
364
+ - cosine_ndcg@10
365
+ - cosine_mrr@10
366
+ - cosine_map@100
367
+ model-index:
368
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
369
+ results:
370
+ - task:
371
+ type: information-retrieval
372
+ name: Information Retrieval
373
+ dataset:
374
+ name: Unknown
375
+ type: unknown
376
+ metrics:
377
+ - type: cosine_accuracy@1
378
+ value: 0.6777144829967202
379
+ name: Cosine Accuracy@1
380
+ - type: cosine_accuracy@3
381
+ value: 0.8972898325565337
382
+ name: Cosine Accuracy@3
383
+ - type: cosine_accuracy@5
384
+ value: 0.9390643880545486
385
+ name: Cosine Accuracy@5
386
+ - type: cosine_accuracy@10
387
+ value: 0.9691006387018816
388
+ name: Cosine Accuracy@10
389
+ - type: cosine_precision@1
390
+ value: 0.6777144829967202
391
+ name: Cosine Precision@1
392
+ - type: cosine_precision@3
393
+ value: 0.2990966108521779
394
+ name: Cosine Precision@3
395
+ - type: cosine_precision@5
396
+ value: 0.18781287761090967
397
+ name: Cosine Precision@5
398
+ - type: cosine_precision@10
399
+ value: 0.09691006387018813
400
+ name: Cosine Precision@10
401
+ - type: cosine_recall@1
402
+ value: 0.6777144829967202
403
+ name: Cosine Recall@1
404
+ - type: cosine_recall@3
405
+ value: 0.8972898325565337
406
+ name: Cosine Recall@3
407
+ - type: cosine_recall@5
408
+ value: 0.9390643880545486
409
+ name: Cosine Recall@5
410
+ - type: cosine_recall@10
411
+ value: 0.9691006387018816
412
+ name: Cosine Recall@10
413
+ - type: cosine_ndcg@10
414
+ value: 0.8364282304724784
415
+ name: Cosine Ndcg@10
416
+ - type: cosine_mrr@10
417
+ value: 0.7924261355385132
418
+ name: Cosine Mrr@10
419
+ - type: cosine_map@100
420
+ value: 0.7938274567816883
421
+ name: Cosine Map@100
422
+ ---
423
+
424
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
425
+
426
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
427
+
428
+ ## Model Details
429
+
430
+ ### Model Description
431
+ - **Model Type:** Sentence Transformer
432
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) <!-- at revision 8e4eaca09c27ad3d501908636ec7c8bc3561b6de -->
433
+ - **Maximum Sequence Length:** 512 tokens
434
+ - **Output Dimensionality:** 768 dimensions
435
+ - **Similarity Function:** Cosine Similarity
436
+ <!-- - **Training Dataset:** Unknown -->
437
+ <!-- - **Language:** Unknown -->
438
+ <!-- - **License:** Unknown -->
439
+
440
+ ### Model Sources
441
+
442
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
443
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
444
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
445
+
446
+ ### Full Model Architecture
447
+
448
+ ```
449
+ SentenceTransformer(
450
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
451
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
452
+ (2): Normalize()
453
+ )
454
+ ```
455
+
456
+ ## Usage
457
+
458
+ ### Direct Usage (Sentence Transformers)
459
+
460
+ First install the Sentence Transformers library:
461
+
462
+ ```bash
463
+ pip install -U sentence-transformers
464
+ ```
465
+
466
+ Then you can load this model and run inference.
467
+ ```python
468
+ from sentence_transformers import SentenceTransformer
469
+
470
+ # Download from the 🤗 Hub
471
+ model = SentenceTransformer("sentence_transformers_model_id")
472
+ # Run inference
473
+ sentences = [
474
+ 'The document outlines various chemical substances classified as carcinogenic or toxic for reproduction, detailing their respective categories and regulatory dates. Specific compounds such as diarsenic trioxide, lead chromate, and chromium trioxide are highlighted, indicating their potential health risks and the timeline for their regulation.',
475
+ '57(f) – human health) (a) 21 August 2013 (*) (b) By way of derogation from point (a): 14 June 2023 for uses in mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by weight. (a) 21 February 2015 (**) (b) By way of derogation from point (a): 14 December 2024 for uses in mixtures containing DIBP at or above 0,1 % and below 0,3 % weight by weight. - [▼M15](./../../../legal-content/EN/AUTO/?uri=celex:32012R0125 "32012R0125: INSERTED") 8. Diarsenic trioxide EC No: 215-481-4 CAS No: 1327-53-3 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 9. Diarsenic pentaoxide EC No: 215-116-9 CAS No: 1303-28-2 Carcinogenic (category 1A) 21 November 2013 21 May 2015 — 10. Lead chromate EC No: 231-846-0 CAS No: 7758-97-6 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 11. Lead sulfochromate yellow (C.I. Pigment Yellow 34) EC No: 215-693-7 CAS No: 1344-37-2 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ — 12. Lead chromate molybdate sulphate red (C.I. Pigment Red 104) EC No: 235-759-9 CAS No: 12656-85-8 Carcinogenic (category 1B) Toxic for reproduction (category 1A) 21 November 2013 ►M43 (*1) ◄ 21 May 2015 ►M43 (*2) ◄ 13. Tris (2-chloroethyl) phosphate (TCEP) EC No: 204-118-5 CAS No: 115-96-8 Toxic for reproduction (category 1B) 21 February 2014 21 August 2015 14. 2,4-Dinitrotoluene (2,4-DNT) EC No: 204-450-0 CAS No: 121-14-2 Carcinogenic (category 1B) 21 February 2014 ►M43 (*1) ◄ 21 August 2015 ►M43 (*2) ◄ [▼M22](./../../../legal-content/EN/AUTO/?uri=celex:32013R0348 "32013R0348: INSERTED") 15. Trichloroethylene EC No: 201-167-4 CAS No: 79-01-6 Carcinogenic (category 1B) 21 October 2014 ►M43 (*1) ◄ 21 April 2016 ►M43 (*2) ◄ — 16. Chromium trioxide EC No: 215-607-8 CAS No: 1333-82-0 Carcinogenic (category 1A) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 17. Acids generated from chromium trioxide and their oligomers Group containing: Chromic acid EC No: 231-801-5 CAS No: 7738-94-5 Dichromic acid EC No: 236-881-5 CAS No: 13530-68-2 Oligomers of chromic acid and dichromic acid EC No: not yet assigned CAS No: not yet assigned Carcinogenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 18. Sodium dichromate EC No: 234-190-3 CAS No: 7789-12-0 10588-01-9 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 19. Potassium dichromate EC No: 231-906-6 CAS No: 7778-50-9 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ — 20. Ammonium dichromate EC No: 232-143-1 CAS No: 7789-09-5 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 21. Potassium chromate EC No: 232-140-5 CAS No: 7789-00-6 Carcinogenic (category 1B) Mutagenic (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ 22. Sodium chromate EC No: 231-889-5 CAS No: 7775-11-3 Carcinogenic (category 1B) Mutagenic (category 1B) Toxic for reproduction (category 1B) 21 March 2016 ►M43 (*1) ◄ 21 September 2017 ►M43 (*2) ◄ [▼M28](./../../../legal-content/EN/AUTO/?uri=celex:32014R0895 "32014R0895: INSERTED") 23. Formaldehyde, oligomeric reaction products with aniline (technical MDA) EC No: 500-036-1 CAS No: 25214-70-4 Carcinogenic (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 24. Arsenic acid EC No: 231-901-9 CAS No: 7778-39-4 Carcinogenic (category 1A) 22 February 2016 22 August 2017 — 25. Bis(2-methoxyethyl) ether (diglyme) EC No: 203-924-4 CAS No: 111-96-6 Toxic for reproduction (category 1B) 22 February 2016 ►M43 (*1) ◄ 22 August 2017 ►M43 (*2) ◄ — 26. 1,2-dichloroethane (EDC) EC No: 203-458-1 CAS No: 107-06-2 Carcinogenic (category 1B) 22 May 2016 22 November 2017 — 27. 2,2′-dichloro-4,4′-methylenedianiline (MOCA) EC No: 202-918-9 CAS No: 101-14-4 Carcinogenic (category 1B) 22 May 2016 ►M43 (*1) ◄ 22 November 2017 ►M43 (*2) ◄ — 28. Dichromium tris(chromate) EC No: 246-356-2 CAS No: 24613-89-6 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 29. Strontium chromate EC No: 232-142-6 CAS No: 7789-06-2 Carcinogenic (category 1B) 22 July 2017 ►M43 (*1) ◄ 22 January 2019 ►M43 (*2) ◄ — 30. Potassium hydroxyoctaoxodizincatedichromate EC',
476
+ '(c)\n\nthe financial soundness of the proposed acquirer, in particular in relation to the type of business pursued and envisaged in the investment firm in which the acquisition is proposed;\n\n(d)\n\nwhether the investment firm will be able to comply and continue to comply with the prudential requirements based on this Directive and, where applicable, other Directives, in particular Directives 2002/87/EC and 2013/36/EU, in particular, whether the group of which it will become a part has a structure that makes it possible to exercise effective supervision, effectively exchange information among the competent authorities and determine the allocation of responsibilities among the competent authorities;\n\n(e)',
477
+ ]
478
+ embeddings = model.encode(sentences)
479
+ print(embeddings.shape)
480
+ # [3, 768]
481
+
482
+ # Get the similarity scores for the embeddings
483
+ similarities = model.similarity(embeddings, embeddings)
484
+ print(similarities.shape)
485
+ # [3, 3]
486
+ ```
487
+
488
+ <!--
489
+ ### Direct Usage (Transformers)
490
+
491
+ <details><summary>Click to see the direct usage in Transformers</summary>
492
+
493
+ </details>
494
+ -->
495
+
496
+ <!--
497
+ ### Downstream Usage (Sentence Transformers)
498
+
499
+ You can finetune this model on your own dataset.
500
+
501
+ <details><summary>Click to expand</summary>
502
+
503
+ </details>
504
+ -->
505
+
506
+ <!--
507
+ ### Out-of-Scope Use
508
+
509
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
510
+ -->
511
+
512
+ ## Evaluation
513
+
514
+ ### Metrics
515
+
516
+ #### Information Retrieval
517
+
518
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
519
+
520
+ | Metric | Value |
521
+ |:--------------------|:-----------|
522
+ | cosine_accuracy@1 | 0.6777 |
523
+ | cosine_accuracy@3 | 0.8973 |
524
+ | cosine_accuracy@5 | 0.9391 |
525
+ | cosine_accuracy@10 | 0.9691 |
526
+ | cosine_precision@1 | 0.6777 |
527
+ | cosine_precision@3 | 0.2991 |
528
+ | cosine_precision@5 | 0.1878 |
529
+ | cosine_precision@10 | 0.0969 |
530
+ | cosine_recall@1 | 0.6777 |
531
+ | cosine_recall@3 | 0.8973 |
532
+ | cosine_recall@5 | 0.9391 |
533
+ | cosine_recall@10 | 0.9691 |
534
+ | **cosine_ndcg@10** | **0.8364** |
535
+ | cosine_mrr@10 | 0.7924 |
536
+ | cosine_map@100 | 0.7938 |
537
+
538
+ <!--
539
+ ## Bias, Risks and Limitations
540
+
541
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
542
+ -->
543
+
544
+ <!--
545
+ ### Recommendations
546
+
547
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
548
+ -->
549
+
550
+ ## Training Details
551
+
552
+ ### Training Dataset
553
+
554
+ #### Unnamed Dataset
555
+
556
+ * Size: 46,338 training samples
557
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
558
+ * Approximate statistics based on the first 1000 samples:
559
+ | | sentence_0 | sentence_1 |
560
+ |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
561
+ | type | string | string |
562
+ | details | <ul><li>min: 11 tokens</li><li>mean: 35.09 tokens</li><li>max: 214 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 202.2 tokens</li><li>max: 512 tokens</li></ul> |
563
+ * Samples:
564
+ | sentence_0 | sentence_1 |
565
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
566
+ | <code>How do the Academies support education and training providers in maintaining and ensuring the quality of the training offered?</code> | <code>to in Chapter IV of this Regulation; (b) promoting the voluntary use of the learning programmes, content and materials by education and training providers in the Member States; --- --- (c) offering support to the education and training providers that use the learning programmes, content and materials produced by the Academies to uphold the quality of the training offered and to develop mechanisms to ensure the quality of the training offered; --- --- (d) developing credentials, including, if appropriate, micro-credentials, for voluntary use by Member States and education and training providers on their territories, in order to facilitate the identification of skills and, where appropriate, the recognition of qualifications, to enhance the</code> |
567
+ | <code>The text provides a comprehensive list of various nickel compounds, including their chemical names and associated identifiers. It covers a range of nickel salts, oxides, and other derivatives, highlighting their diverse applications and chemical properties. The compounds mentioned include nickel arsenate, nickel oxalate, and nickel dichromate, among others, indicating their significance in industrial and chemical processes.</code> | <code>[5] 235-688-3 [5] 12519-85-6 [5] Dinickel hexacyanoferrate 028-037-00-8 238-946-3 14874-78-3 Trinickel bis(arsenate); Nickel (II) arsenate 028-038-00-3 236-771-7 13477-70-8 Nickel oxalate; [1] 028-039-00-9 208-933-7 [1] 547-67-1 [1] Oxalic acid, nickel salt; [2] 243-867-2 [2] 20543-06-0 [2] Nickel telluride 028-040-00-4 235-260-6 12142-88-0 Trinickel tetrasulfide 028-041-00-X — 12137-12-1 Trinickel bis(arsenite) 028-042-00-5 — 74646-29-0 Cobalt nickel gray periclase; 028-043-00-0 C.I. Pigment Black 25; C.I. 77332; [1] 269-051-6 [1] 68186-89-0 [1] Cobalt nickel dioxide; [2] 261-346-8 [2] 58591-45-0 [2] Cobalt nickel oxide; [3] - [3] 12737-30-3 [3] Nickel tin trioxide; Nickel stannate 028-044-00-6 234-824-9 12035-38-0 Nickel triuranium decaoxide 028-045-00-1 239-876-6 15780-33-3 Nickel dithiocyanate 028-046-00-7 237-205-1 13689-92-4 Nickel dichromate 028-047-00-2 239-646-5 15586-38-6 Nickel (II) selenite 028-048-00-8 233-263-7 10101-96-9 Nickel selenide 028-049-00-3 215-216-2 1314-05-2 S...</code> |
568
+ | <code>What is the definition of 'Union airport managing body' and how does it relate to the management of centralized infrastructures for fuel distribution systems?</code> | <code>(2)<br><br>‘Union airport managing body’ means, in respect of a Union airport, the ‘airport managing body’ as defined in Article 2, point (2), of Directive 2009/12/EC or, where the Member State concerned has reserved the management of the centralised infrastructures for fuel distribution systems for another body pursuant to Article 8(1) of Council Directive 96/67/EC ( 2 ), that other body;<br><br>(3)<br><br>‘aircraft operator’ means a person that operated at least 500 commercial passenger air transport flights, or 52 commercial all-cargo air transport flights departing from Union airports in the previous reporting period or, where it is not possible for that person to be identified, the owner of the aircraft;<br><br>(4)</code> |
569
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
570
+ ```json
571
+ {
572
+ "loss": "MultipleNegativesRankingLoss",
573
+ "matryoshka_dims": [
574
+ 768,
575
+ 512,
576
+ 256,
577
+ 128,
578
+ 64
579
+ ],
580
+ "matryoshka_weights": [
581
+ 1,
582
+ 1,
583
+ 1,
584
+ 1,
585
+ 1
586
+ ],
587
+ "n_dims_per_step": -1
588
+ }
589
+ ```
590
+
591
+ ### Training Hyperparameters
592
+ #### Non-Default Hyperparameters
593
+
594
+ - `eval_strategy`: steps
595
+ - `per_device_train_batch_size`: 4
596
+ - `per_device_eval_batch_size`: 4
597
+ - `num_train_epochs`: 4
598
+ - `multi_dataset_batch_sampler`: round_robin
599
+
600
+ #### All Hyperparameters
601
+ <details><summary>Click to expand</summary>
602
+
603
+ - `overwrite_output_dir`: False
604
+ - `do_predict`: False
605
+ - `eval_strategy`: steps
606
+ - `prediction_loss_only`: True
607
+ - `per_device_train_batch_size`: 4
608
+ - `per_device_eval_batch_size`: 4
609
+ - `per_gpu_train_batch_size`: None
610
+ - `per_gpu_eval_batch_size`: None
611
+ - `gradient_accumulation_steps`: 1
612
+ - `eval_accumulation_steps`: None
613
+ - `torch_empty_cache_steps`: None
614
+ - `learning_rate`: 5e-05
615
+ - `weight_decay`: 0.0
616
+ - `adam_beta1`: 0.9
617
+ - `adam_beta2`: 0.999
618
+ - `adam_epsilon`: 1e-08
619
+ - `max_grad_norm`: 1
620
+ - `num_train_epochs`: 4
621
+ - `max_steps`: -1
622
+ - `lr_scheduler_type`: linear
623
+ - `lr_scheduler_kwargs`: {}
624
+ - `warmup_ratio`: 0.0
625
+ - `warmup_steps`: 0
626
+ - `log_level`: passive
627
+ - `log_level_replica`: warning
628
+ - `log_on_each_node`: True
629
+ - `logging_nan_inf_filter`: True
630
+ - `save_safetensors`: True
631
+ - `save_on_each_node`: False
632
+ - `save_only_model`: False
633
+ - `restore_callback_states_from_checkpoint`: False
634
+ - `no_cuda`: False
635
+ - `use_cpu`: False
636
+ - `use_mps_device`: False
637
+ - `seed`: 42
638
+ - `data_seed`: None
639
+ - `jit_mode_eval`: False
640
+ - `use_ipex`: False
641
+ - `bf16`: False
642
+ - `fp16`: False
643
+ - `fp16_opt_level`: O1
644
+ - `half_precision_backend`: auto
645
+ - `bf16_full_eval`: False
646
+ - `fp16_full_eval`: False
647
+ - `tf32`: None
648
+ - `local_rank`: 0
649
+ - `ddp_backend`: None
650
+ - `tpu_num_cores`: None
651
+ - `tpu_metrics_debug`: False
652
+ - `debug`: []
653
+ - `dataloader_drop_last`: False
654
+ - `dataloader_num_workers`: 0
655
+ - `dataloader_prefetch_factor`: None
656
+ - `past_index`: -1
657
+ - `disable_tqdm`: False
658
+ - `remove_unused_columns`: True
659
+ - `label_names`: None
660
+ - `load_best_model_at_end`: False
661
+ - `ignore_data_skip`: False
662
+ - `fsdp`: []
663
+ - `fsdp_min_num_params`: 0
664
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
665
+ - `fsdp_transformer_layer_cls_to_wrap`: None
666
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
667
+ - `deepspeed`: None
668
+ - `label_smoothing_factor`: 0.0
669
+ - `optim`: adamw_torch
670
+ - `optim_args`: None
671
+ - `adafactor`: False
672
+ - `group_by_length`: False
673
+ - `length_column_name`: length
674
+ - `ddp_find_unused_parameters`: None
675
+ - `ddp_bucket_cap_mb`: None
676
+ - `ddp_broadcast_buffers`: False
677
+ - `dataloader_pin_memory`: True
678
+ - `dataloader_persistent_workers`: False
679
+ - `skip_memory_metrics`: True
680
+ - `use_legacy_prediction_loop`: False
681
+ - `push_to_hub`: False
682
+ - `resume_from_checkpoint`: None
683
+ - `hub_model_id`: None
684
+ - `hub_strategy`: every_save
685
+ - `hub_private_repo`: None
686
+ - `hub_always_push`: False
687
+ - `gradient_checkpointing`: False
688
+ - `gradient_checkpointing_kwargs`: None
689
+ - `include_inputs_for_metrics`: False
690
+ - `include_for_metrics`: []
691
+ - `eval_do_concat_batches`: True
692
+ - `fp16_backend`: auto
693
+ - `push_to_hub_model_id`: None
694
+ - `push_to_hub_organization`: None
695
+ - `mp_parameters`:
696
+ - `auto_find_batch_size`: False
697
+ - `full_determinism`: False
698
+ - `torchdynamo`: None
699
+ - `ray_scope`: last
700
+ - `ddp_timeout`: 1800
701
+ - `torch_compile`: False
702
+ - `torch_compile_backend`: None
703
+ - `torch_compile_mode`: None
704
+ - `dispatch_batches`: None
705
+ - `split_batches`: None
706
+ - `include_tokens_per_second`: False
707
+ - `include_num_input_tokens_seen`: False
708
+ - `neftune_noise_alpha`: None
709
+ - `optim_target_modules`: None
710
+ - `batch_eval_metrics`: False
711
+ - `eval_on_start`: False
712
+ - `use_liger_kernel`: False
713
+ - `eval_use_gather_object`: False
714
+ - `average_tokens_across_devices`: False
715
+ - `prompts`: None
716
+ - `batch_sampler`: batch_sampler
717
+ - `multi_dataset_batch_sampler`: round_robin
718
+
719
+ </details>
720
+
721
+ ### Training Logs
722
+ | Epoch | Step | Training Loss | cosine_ndcg@10 |
723
+ |:------:|:-----:|:-------------:|:--------------:|
724
+ | 0.0432 | 500 | 0.5169 | 0.7365 |
725
+ | 0.0863 | 1000 | 0.1341 | 0.7914 |
726
+ | 0.1295 | 1500 | 0.0784 | 0.7992 |
727
+ | 0.1726 | 2000 | 0.0782 | 0.8058 |
728
+ | 0.2158 | 2500 | 0.0596 | 0.8012 |
729
+ | 0.2590 | 3000 | 0.057 | 0.8079 |
730
+ | 0.3021 | 3500 | 0.0785 | 0.8086 |
731
+ | 0.3453 | 4000 | 0.0423 | 0.8010 |
732
+ | 0.3884 | 4500 | 0.0586 | 0.8075 |
733
+ | 0.4316 | 5000 | 0.0508 | 0.8008 |
734
+ | 0.4748 | 5500 | 0.0764 | 0.7934 |
735
+ | 0.5179 | 6000 | 0.0583 | 0.8068 |
736
+ | 0.5611 | 6500 | 0.0663 | 0.8008 |
737
+ | 0.6042 | 7000 | 0.0344 | 0.8083 |
738
+ | 0.6474 | 7500 | 0.0506 | 0.8104 |
739
+ | 0.6905 | 8000 | 0.0478 | 0.8089 |
740
+ | 0.7337 | 8500 | 0.0509 | 0.8034 |
741
+ | 0.7769 | 9000 | 0.0426 | 0.8114 |
742
+ | 0.8200 | 9500 | 0.0603 | 0.8097 |
743
+ | 0.8632 | 10000 | 0.036 | 0.8142 |
744
+ | 0.9063 | 10500 | 0.0581 | 0.8081 |
745
+ | 0.9495 | 11000 | 0.0351 | 0.8018 |
746
+ | 0.9927 | 11500 | 0.0358 | 0.8082 |
747
+ | 1.0 | 11585 | - | 0.8076 |
748
+ | 1.0358 | 12000 | 0.0398 | 0.8093 |
749
+ | 1.0790 | 12500 | 0.0197 | 0.8023 |
750
+ | 1.1221 | 13000 | 0.0376 | 0.8137 |
751
+ | 1.1653 | 13500 | 0.0287 | 0.8136 |
752
+ | 1.2085 | 14000 | 0.0269 | 0.8146 |
753
+ | 1.2516 | 14500 | 0.0089 | 0.8161 |
754
+ | 1.2948 | 15000 | 0.0149 | 0.8126 |
755
+ | 1.3379 | 15500 | 0.0457 | 0.8138 |
756
+ | 1.3811 | 16000 | 0.0119 | 0.8171 |
757
+ | 1.4243 | 16500 | 0.0107 | 0.8105 |
758
+ | 1.4674 | 17000 | 0.015 | 0.8171 |
759
+ | 1.5106 | 17500 | 0.0208 | 0.8153 |
760
+ | 1.5537 | 18000 | 0.0168 | 0.8111 |
761
+ | 1.5969 | 18500 | 0.0114 | 0.8171 |
762
+ | 1.6401 | 19000 | 0.0188 | 0.8239 |
763
+ | 1.6832 | 19500 | 0.01 | 0.8182 |
764
+ | 1.7264 | 20000 | 0.0158 | 0.8125 |
765
+ | 1.7695 | 20500 | 0.0155 | 0.8201 |
766
+ | 1.8127 | 21000 | 0.0276 | 0.8182 |
767
+ | 1.8558 | 21500 | 0.0245 | 0.8123 |
768
+ | 1.8990 | 22000 | 0.0135 | 0.8223 |
769
+ | 1.9422 | 22500 | 0.0334 | 0.8182 |
770
+ | 1.9853 | 23000 | 0.0111 | 0.8200 |
771
+ | 2.0 | 23170 | - | 0.8221 |
772
+ | 2.0285 | 23500 | 0.0139 | 0.8225 |
773
+ | 2.0716 | 24000 | 0.0113 | 0.8237 |
774
+ | 2.1148 | 24500 | 0.0072 | 0.8223 |
775
+ | 2.1580 | 25000 | 0.0138 | 0.8218 |
776
+ | 2.2011 | 25500 | 0.0071 | 0.8200 |
777
+ | 2.2443 | 26000 | 0.0091 | 0.8240 |
778
+ | 2.2874 | 26500 | 0.013 | 0.8224 |
779
+ | 2.3306 | 27000 | 0.008 | 0.8248 |
780
+ | 2.3738 | 27500 | 0.0084 | 0.8203 |
781
+ | 2.4169 | 28000 | 0.0147 | 0.8255 |
782
+ | 2.4601 | 28500 | 0.0067 | 0.8268 |
783
+ | 2.5032 | 29000 | 0.0028 | 0.8219 |
784
+ | 2.5464 | 29500 | 0.0124 | 0.8234 |
785
+ | 2.5896 | 30000 | 0.0051 | 0.8237 |
786
+ | 2.6327 | 30500 | 0.0151 | 0.8256 |
787
+ | 2.6759 | 31000 | 0.0051 | 0.8207 |
788
+ | 2.7190 | 31500 | 0.0086 | 0.8250 |
789
+ | 2.7622 | 32000 | 0.0152 | 0.8265 |
790
+ | 2.8054 | 32500 | 0.0085 | 0.8297 |
791
+ | 2.8485 | 33000 | 0.0097 | 0.8316 |
792
+ | 2.8917 | 33500 | 0.0269 | 0.8284 |
793
+ | 2.9348 | 34000 | 0.008 | 0.8305 |
794
+ | 2.9780 | 34500 | 0.0146 | 0.8309 |
795
+ | 3.0 | 34755 | - | 0.8301 |
796
+ | 3.0211 | 35000 | 0.0218 | 0.8326 |
797
+ | 3.0643 | 35500 | 0.0152 | 0.8301 |
798
+ | 3.1075 | 36000 | 0.0072 | 0.8290 |
799
+ | 3.1506 | 36500 | 0.0077 | 0.8270 |
800
+ | 3.1938 | 37000 | 0.0155 | 0.8299 |
801
+ | 3.2369 | 37500 | 0.0069 | 0.8328 |
802
+ | 3.2801 | 38000 | 0.0103 | 0.8364 |
803
+
804
+
805
+ ### Framework Versions
806
+ - Python: 3.10.11
807
+ - Sentence Transformers: 3.4.1
808
+ - Transformers: 4.48.1
809
+ - PyTorch: 2.4.0+cu121
810
+ - Accelerate: 1.4.0
811
+ - Datasets: 3.3.2
812
+ - Tokenizers: 0.21.0
813
+
814
+ ## Citation
815
+
816
+ ### BibTeX
817
+
818
+ #### Sentence Transformers
819
+ ```bibtex
820
+ @inproceedings{reimers-2019-sentence-bert,
821
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
822
+ author = "Reimers, Nils and Gurevych, Iryna",
823
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
824
+ month = "11",
825
+ year = "2019",
826
+ publisher = "Association for Computational Linguistics",
827
+ url = "https://arxiv.org/abs/1908.10084",
828
+ }
829
+ ```
830
+
831
+ #### MatryoshkaLoss
832
+ ```bibtex
833
+ @misc{kusupati2024matryoshka,
834
+ title={Matryoshka Representation Learning},
835
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
836
+ year={2024},
837
+ eprint={2205.13147},
838
+ archivePrefix={arXiv},
839
+ primaryClass={cs.LG}
840
+ }
841
+ ```
842
+
843
+ #### MultipleNegativesRankingLoss
844
+ ```bibtex
845
+ @misc{henderson2017efficient,
846
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
847
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
848
+ year={2017},
849
+ eprint={1705.00652},
850
+ archivePrefix={arXiv},
851
+ primaryClass={cs.CL}
852
+ }
853
+ ```
854
+
855
+ <!--
856
+ ## Glossary
857
+
858
+ *Clearly define terms in order to be accessible across audiences.*
859
+ -->
860
+
861
+ <!--
862
+ ## Model Card Authors
863
+
864
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
865
+ -->
866
+
867
+ <!--
868
+ ## Model Card Contact
869
+
870
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
871
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-m-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.48.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.1",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
eval/Information-Retrieval_evaluation_results.csv ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
+ 1.0,11585,0.6407733471431037,0.8641463835663732,0.913688934921457,0.9535646469877438,0.6407733471431037,0.6407733471431037,0.2880487945221244,0.8641463835663732,0.18273778698429138,0.913688934921457,0.09535646469877437,0.9535646469877438,0.7595666910529687,0.807555539832259,0.7615541387765282
3
+ 2.0,23170,0.6606248921111687,0.8817538408423959,0.9254272397721388,0.9601242879337131,0.6606248921111687,0.6606248921111687,0.29391794694746537,0.8817538408423959,0.18508544795442772,0.9254272397721388,0.09601242879337128,0.9601242879337131,0.7765838765450377,0.8221377062027736,0.7785590851486591
4
+ 3.0,34755,0.6658035560158813,0.8953909891248057,0.9352667011910927,0.9675470395304678,0.6658035560158813,0.6658035560158813,0.29846366304160193,0.8953909891248057,0.1870533402382185,0.9352667011910927,0.09675470395304678,0.9675470395304678,0.7845114108708103,0.8300831579250323,0.786055972684675
5
+ 4.0,46340,0.6759882616951494,0.8984981874676333,0.9373381667529778,0.9696185050923528,0.6759882616951494,0.6759882616951494,0.2994993958225444,0.8984981874676333,0.18746763335059555,0.9373381667529778,0.09696185050923527,0.9696185050923528,0.7910710518167795,0.8355178184076232,0.7924944010948121
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7c32a0a3c6de3cd1b80ec5d0819e76d0f8094c331a236d4e55e04e3407a4042
3
+ size 435588776
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff