ragunath-ravi commited on
Commit
87ecbc0
·
verified ·
1 Parent(s): 316fbfb

v2: fix merge ratio 85/15 — preserve instruction following

Browse files
README.md CHANGED
@@ -1,76 +1,39 @@
1
  ---
2
- license: apache-2.0
3
- base_model:
4
- - Qwen/Qwen2.5-7B-Instruct
5
- - Qwen/Qwen2.5-Coder-7B-Instruct
6
  tags:
7
- - merge
8
- - linear
9
- - qwen2.5
10
- - coding
11
- - chat
12
- - mergekit
13
- language:
14
- - en
15
- pipeline_tag: text-generation
16
- ---
17
-
18
- # Qwen2.5-7B-ChatCoder
19
 
20
- A linearly merged model combining **Qwen2.5-7B-Instruct** and **Qwen2.5-Coder-7B-Instruct** (60% chat / 40% coder).
21
-
22
- ## Usage
23
 
24
- ```python
25
- from transformers import AutoModelForCausalLM, AutoTokenizer
26
- import torch
27
 
28
- model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"
29
- tokenizer = AutoTokenizer.from_pretrained(model_id)
30
- model = AutoModelForCausalLM.from_pretrained(
31
- model_id,
32
- dtype=torch.bfloat16,
33
- device_map="auto",
34
- )
35
- model.eval()
36
 
37
- messages = [
38
- {"role": "system", "content": "You are a helpful coding assistant."},
39
- {"role": "user", "content": "Write a binary search in Python."},
40
- ]
41
- text = tokenizer.apply_chat_template(
42
- messages, tokenize=False, add_generation_prompt=True
43
- )
44
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
45
 
46
- with torch.no_grad():
47
- out = model.generate(
48
- **inputs,
49
- max_new_tokens=512,
50
- do_sample=False,
51
- temperature=None,
52
- top_p=None,
53
- repetition_penalty=1.1,
54
- eos_token_id=[151645, 151643], # <|im_end|> and <|endoftext|>
55
- pad_token_id=151645,
56
- )
57
- print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
58
- ```
59
 
60
- ## Merge Details
 
 
61
 
62
- | Property | Value |
63
- |---|---|
64
- | Method | Linear (weighted average) |
65
- | Chat instruct weight | 0.6 |
66
- | Coder weight | 0.4 |
67
- | dtype | bfloat16 |
68
 
69
- ## Hardware
70
 
71
- | Precision | VRAM |
72
- |---|---|
73
- | bfloat16 | ~16 GB |
74
- | 4-bit (bnb) | ~5 GB |
 
 
 
 
75
 
76
- Created by [ragunath-ravi](https://huggingface.co/ragunath-ravi) using [mergekit](https://github.com/arcee-ai/mergekit).
 
 
 
1
  ---
2
+ base_model: []
3
+ library_name: transformers
 
 
4
  tags:
5
+ - mergekit
6
+ - merge
 
 
 
 
 
 
 
 
 
 
7
 
8
+ ---
9
+ # merged-model-v2
 
10
 
11
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
12
 
13
+ ## Merge Details
14
+ ### Merge Method
 
 
 
 
 
 
15
 
16
+ This model was merged using the [Linear](https://arxiv.org/abs/2203.05482) merge method.
 
 
 
 
 
 
 
17
 
18
+ ### Models Merged
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ The following models were included in the merge:
21
+ * /kaggle/input/models/qwen-lm/qwen2.5/transformers/7b-instruct/1
22
+ * /kaggle/input/models/qwen-lm/qwen2.5-coder/transformers/7b/1
23
 
24
+ ### Configuration
 
 
 
 
 
25
 
26
+ The following YAML configuration was used to produce this model:
27
 
28
+ ```yaml
29
+ models:
30
+ - model: /kaggle/input/models/qwen-lm/qwen2.5/transformers/7b-instruct/1
31
+ parameters:
32
+ weight: 0.85
33
+ - model: /kaggle/input/models/qwen-lm/qwen2.5-coder/transformers/7b/1
34
+ parameters:
35
+ weight: 0.15
36
 
37
+ merge_method: linear
38
+ dtype: bfloat16
39
+ ```
mergekit_config.yml CHANGED
@@ -1,10 +1,10 @@
1
  models:
2
  - model: /kaggle/input/models/qwen-lm/qwen2.5/transformers/7b-instruct/1
3
  parameters:
4
- weight: 0.6
5
  - model: /kaggle/input/models/qwen-lm/qwen2.5-coder/transformers/7b/1
6
  parameters:
7
- weight: 0.4
8
 
9
  merge_method: linear
10
  dtype: bfloat16
 
1
  models:
2
  - model: /kaggle/input/models/qwen-lm/qwen2.5/transformers/7b-instruct/1
3
  parameters:
4
+ weight: 0.85
5
  - model: /kaggle/input/models/qwen-lm/qwen2.5-coder/transformers/7b/1
6
  parameters:
7
+ weight: 0.15
8
 
9
  merge_method: linear
10
  dtype: bfloat16
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c334a6fb2b7f9ff76222392c4e0f889f6b4bb04728829d828eb1953b4612dc6
3
  size 4976698776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71bbe7642888659d56f9aa27450fb82d72041d077b869d09d1d6a0f1e6afc282
3
  size 4976698776
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:84d2a06ed8a18c5259fe2415b5150359a4a8c0f3e4208d602e313b25ae1602a1
3
  size 4932751032
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3f2a5df171f8505fb611a2a2b11da62dada95f0fef07bba5b6b01e080257dbf
3
  size 4932751032
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:40bd46ed157269fa8fa817b2beb7876d6c6e810a03108f38d780f0e951d917b5
3
  size 4991495808
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bd2f7ecee402267f702f2be3fbec5765a47a503cd77a4100537ce987d976f78
3
  size 4991495808
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:14636fe9efd734a900818b1ce848377327f7ba652e3b7f936fc6866f9e37791f
3
  size 330326240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44bdabe1c577f1a3e689c7b8486c8f0a5a258e83d3815848852d2cca23d71f6
3
  size 330326240
vocab.json ADDED
The diff for this file is too large to render. See raw diff