LG-AI-EXAONE commited on
Commit
27c3a76
·
1 Parent(s): 5aa6275

Update README.md and config files

Browse files
.gitattributes CHANGED
@@ -33,6 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- assets/K-EXAONE_Symbol_3d.png filter=lfs diff=lfs merge=lfs -text
37
  assets/main_figure.png filter=lfs diff=lfs merge=lfs -text
38
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/K-EXAONE_logo_gray.png filter=lfs diff=lfs merge=lfs -text
37
  assets/main_figure.png filter=lfs diff=lfs merge=lfs -text
38
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -20,7 +20,7 @@ library_name: transformers
20
  <br>
21
  <br>
22
  <p align="center">
23
- <img src="assets/K-EXAONE_Symbol_3d.png" width="400">
24
  <br>
25
  <br>
26
  <br>
@@ -51,6 +51,8 @@ library_name: transformers
51
 
52
  <br>
53
 
 
 
54
  ## Introduction
55
 
56
  We introduce **K-EXAONE**, a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features **236 billion total** parameters, with **23 billion active** during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
@@ -382,22 +384,22 @@ Until the libraries officially support K-EXAONE, you need to install the require
382
 
383
  #### Transformers
384
 
385
- You can install the latest version of Transformers with support for EXAONE-MoE architecture from [this repository](https://github.com/Aim-Highest/transformers).
386
  The base version of Transformers is `5.0.0rc1`, so it might be helpful to check [the migration guide](https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md) from the Transformers library.
387
 
388
  #### vLLM
389
 
390
  You should install both Transformers and vLLM to use K-EXAONE model on vLLM server.
391
- You can install the latest version of vLLM with support for EXAONE-MoE architecture from [this repository](https://github.com/Aim-Highest/vllm/tree/add-exaone-moe).
392
 
393
  #### SGLang
394
 
395
  You should install both Transformers and SGLang to use K-EXAONE model on SGLang server.
396
- You can install the latest version of SGLang with support for EXAONE-MoE architecture from [this repository](https://github.com/Aim-Highest/sglang).
397
 
398
  #### llama.cpp
399
 
400
- You can install the latest version of llama.cpp with support for EXAONE-MoE architecture from [this repository](https://github.com/Aim-Highest/llama.cpp).
401
  Please refer to the [official build guide](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) for details.
402
 
403
 
@@ -438,6 +440,7 @@ generated_ids = model.generate(
438
  max_new_tokens=16384,
439
  temperature=1.0,
440
  top_p=0.95,
 
441
  )
442
  output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
443
  print(tokenizer.decode(output_ids, skip_special_tokens=True))
@@ -465,6 +468,7 @@ generated_ids = model.generate(
465
  max_new_tokens=1024,
466
  temperature=1.0,
467
  top_p=0.95,
 
468
  )
469
  output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
470
  print(tokenizer.decode(output_ids, skip_special_tokens=True))
@@ -510,6 +514,7 @@ generated_ids = model.generate(
510
  max_new_tokens=16384,
511
  temperature=1.0,
512
  top_p=0.95,
 
513
  )
514
  output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
515
  print(tokenizer.decode(output_ids, skip_special_tokens=True))
 
20
  <br>
21
  <br>
22
  <p align="center">
23
+ <img src="assets/K-EXAONE_logo_gray.png" width="400">
24
  <br>
25
  <br>
26
  <br>
 
51
 
52
  <br>
53
 
54
+ # K-EXAONE-236B-A23B
55
+
56
  ## Introduction
57
 
58
  We introduce **K-EXAONE**, a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features **236 billion total** parameters, with **23 billion active** during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
 
384
 
385
  #### Transformers
386
 
387
+ You can install the latest version of Transformers with support for EXAONE-MoE architecture from [this repository](https://github.com/nuxlear/transformers/tree/add-exaone-moe).
388
  The base version of Transformers is `5.0.0rc1`, so it might be helpful to check [the migration guide](https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md) from the Transformers library.
389
 
390
  #### vLLM
391
 
392
  You should install both Transformers and vLLM to use K-EXAONE model on vLLM server.
393
+ You can install the latest version of vLLM with support for EXAONE-MoE architecture from [this repository](https://github.com/lkm2835/vllm/tree/add-exaone-moe).
394
 
395
  #### SGLang
396
 
397
  You should install both Transformers and SGLang to use K-EXAONE model on SGLang server.
398
+ You can install the latest version of SGLang with support for EXAONE-MoE architecture from [this repository](https://github.com/xvyaward/sglang/tree/exaone_moe_official).
399
 
400
  #### llama.cpp
401
 
402
+ You can install the latest version of llama.cpp with support for EXAONE-MoE architecture from [this repository](https://github.com/nuxlear/llama.cpp/tree/add-exaone-moe).
403
  Please refer to the [official build guide](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) for details.
404
 
405
 
 
440
  max_new_tokens=16384,
441
  temperature=1.0,
442
  top_p=0.95,
443
+ do_sample=True,
444
  )
445
  output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
446
  print(tokenizer.decode(output_ids, skip_special_tokens=True))
 
468
  max_new_tokens=1024,
469
  temperature=1.0,
470
  top_p=0.95,
471
+ do_sample=True,
472
  )
473
  output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
474
  print(tokenizer.decode(output_ids, skip_special_tokens=True))
 
514
  max_new_tokens=16384,
515
  temperature=1.0,
516
  top_p=0.95,
517
+ do_sample=True,
518
  )
519
  output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
520
  print(tokenizer.decode(output_ids, skip_special_tokens=True))
assets/{K-EXAONE_Symbol_3d.png → K-EXAONE_logo_gray.png} RENAMED
File without changes
config.json CHANGED
@@ -6,7 +6,7 @@
6
  "bos_token_id": 1,
7
  "dtype": "bfloat16",
8
  "eos_token_id": 53,
9
- "first_last_k_dense_replace": 1,
10
  "head_dim": 128,
11
  "hidden_act": "silu",
12
  "hidden_size": 6144,
@@ -130,13 +130,11 @@
130
  "rope_type": "default"
131
  },
132
  "routed_scaling_factor": 2.5,
133
- "scoring_func": "sigmoid",
134
  "sliding_window": 128,
135
  "sliding_window_pattern": "LLLG",
136
  "tie_word_embeddings": false,
137
  "tokenizer_class": "GPT2Tokenizer",
138
  "topk_group": 1,
139
- "topk_method": "noaux_tc",
140
  "transformers_version": "5.0.0.dev0",
141
  "use_cache": true,
142
  "vocab_size": 153600
 
6
  "bos_token_id": 1,
7
  "dtype": "bfloat16",
8
  "eos_token_id": 53,
9
+ "first_k_dense_replace": 1,
10
  "head_dim": 128,
11
  "hidden_act": "silu",
12
  "hidden_size": 6144,
 
130
  "rope_type": "default"
131
  },
132
  "routed_scaling_factor": 2.5,
 
133
  "sliding_window": 128,
134
  "sliding_window_pattern": "LLLG",
135
  "tie_word_embeddings": false,
136
  "tokenizer_class": "GPT2Tokenizer",
137
  "topk_group": 1,
 
138
  "transformers_version": "5.0.0.dev0",
139
  "use_cache": true,
140
  "vocab_size": 153600
generation_config.json CHANGED
@@ -1,6 +1,7 @@
1
  {
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
 
4
  "eos_token_id": 53,
5
  "pad_token_id": 0,
6
  "presence_penalty": 0.0,
 
1
  {
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
+ "do_sample": true,
5
  "eos_token_id": 53,
6
  "pad_token_id": 0,
7
  "presence_penalty": 0.0,