janakhpon
/

mon-lm-qwen2.5-1.5b

@@ -1,26 +1,26 @@
 ---
 language:
-- mnw
 license: mit
 base_model: Qwen/Qwen2.5-1.5B
 tags:
-- mon
-- mnw
-- qwen
-- qwen2.5
-- cpt
-- continual-pretraining
-- tokenizer-expansion
 datasets:
-- janakhpon/mon-corpus-collection
 model-index:
-- name: Mon-LM-Qwen2.5-1.5B
-  results: []
 ---
 # Mon-LM (Qwen2.5-1.5B)
-**Mon-LM** is a production-grade Large Language Model for the **Mon language (mnw)**. It is based on **Qwen2.5-1.5B** and has undergone **Continual Pre-Training (CPT)** on a high-quality Mon language corpus.
 ## Model Details
@@ -32,11 +32,11 @@ model-index:
 ## Vocabulary Expansion
-The base Qwen2.5 tokenizer was expanded to better handle the Mon script. We injected the top-performing Mon subwords into the embedding layer, significantly improving the compression ratio and linguistic atomicity for Mon text.
 ## Usage
-You can use this model directly with the Hugging Face `transformers` library:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -55,4 +55,4 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ## Acknowledgments
-This model was trained as part of the Mon Language AI initiative. Special thanks to the Mon community for the corpus collection efforts.

 ---
 language:
+  - mnw
 license: mit
 base_model: Qwen/Qwen2.5-1.5B
 tags:
+  - mon
+  - mnw
+  - qwen
+  - qwen2.5
+  - cpt
+  - continual-pretraining
+  - tokenizer-expansion
 datasets:
+  - janakhpon/mon-corpus-collection
 model-index:
+  - name: Mon-LM-Qwen2.5-1.5B
+    results: []
 ---
 # Mon-LM (Qwen2.5-1.5B)
+Mon-LM is a Large Language Model for the Mon language (mnw). It is based on Qwen2.5-1.5B and has undergone Continual Pre-Training (CPT) on a Mon language corpus.
 ## Model Details
 ## Vocabulary Expansion
+The base Qwen2.5 tokenizer was expanded for the Mon script. Mon subwords were injected into the embedding layer to adjust the compression ratio and linguistic atomicity for Mon text.
 ## Usage
+Use this model with the Hugging Face `transformers` library:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 ## Acknowledgments
+This model was trained as part of the Mon Language AI initiative. Credits to the Mon community for the corpus collection efforts.