fromthesky commited on
Commit
637fb33
·
1 Parent(s): 2aed2e6

Updated readme

Browse files

Updated transformers version in generation_config.json

Files changed (2) hide show
  1. README.md +10 -8
  2. generation_config.json +1 -1
README.md CHANGED
@@ -52,9 +52,11 @@ pipeline = pipeline(
52
  trust_remote_code=True
53
  )
54
 
55
- prompt=('One time they had a drumming contest, and I didn’t do very well: '
56
- 'They said my drumming was "too intellectual"; theirs was much more pulsing.')
57
- output=pipeline(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True, use_cache=True, max_new_tokens=100)
 
 
58
  print(output[0]["generated_text"])
59
  ```
60
 
@@ -71,9 +73,9 @@ tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path="fromthesk
71
  legacy=False,
72
  trust_remote_code=True
73
  )
74
-
75
- prompt=('One time they had a drumming contest, and I didn’t do very well: '
76
- 'They said my drumming was "too intellectual"; theirs was much more pulsing.')
77
  inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
78
  generated_ids = model.generate(**inputs,
79
  max_new_tokens=100,
@@ -85,7 +87,6 @@ generated_ids = model.generate(**inputs,
85
  )
86
  print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
87
  ```
88
- <sup>\*</sup> `prompt` string is a quote from Richard Feynman in Surely You're Joking, Mr. Feynman! Adventures of a Curious Character.
89
 
90
  #### PLDR-LLM specific configurations:
91
  - `custom_G_type`: `None` for learned G values during pretraining, `'identity'` for LLM with SDPA equivalent, `'random'` for G values from a random normal distribution, `'external'` for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
@@ -109,7 +110,8 @@ See config.json for other model configuration details.
109
  pip install -e ".[dev]"
110
  ```
111
  - Static cache is not supported for models with `custom_G_type=None`.
112
- - When `add_bos_token=False` and `add_eos_token=False` are set for the tokenizer model, prompt `""` is an invalid input for single batch inference as it doesn't contain any tokens. When padding is enabled, batched inference with prompt `""` as one of the samples causes its `input_ids` to be pad tokens and `attention_mask` to be all zeros. This edge case is handled differently for `_attn_implementation='eager'` and `'sdpa'`, resulting in different generation outputs for this prompt. Setting `add_bos_token=True`, `add_eos_token=True` or explicitly providing prompt as `"[PAD]"`, `"[START]"`, or `"[END]"` gives same output for either implementation. This issue does not affect KV-cache and G-cache.
 
113
 
114
  ### Via Original Implementation
115
 
 
52
  trust_remote_code=True
53
  )
54
 
55
+ prompt="The quick brown fox jumps over the lazy dog."
56
+
57
+ output=pipeline(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True,
58
+ tokenizer_encode_kwargs={"add_special_tokens":False},
59
+ use_cache=True, max_new_tokens=100)
60
  print(output[0]["generated_text"])
61
  ```
62
 
 
73
  legacy=False,
74
  trust_remote_code=True
75
  )
76
+
77
+ prompt="The quick brown fox jumps over the lazy dog."
78
+
79
  inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
80
  generated_ids = model.generate(**inputs,
81
  max_new_tokens=100,
 
87
  )
88
  print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
89
  ```
 
90
 
91
  #### PLDR-LLM specific configurations:
92
  - `custom_G_type`: `None` for learned G values during pretraining, `'identity'` for LLM with SDPA equivalent, `'random'` for G values from a random normal distribution, `'external'` for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
 
110
  pip install -e ".[dev]"
111
  ```
112
  - Static cache is not supported for models with `custom_G_type=None`.
113
+ - PLDR-LLM uses EOS token `"[END]"` during pretraining to indicate end of a sequence. For text generation, we do not need to add the EOS token to the prompt. To achieve this, `add_eos_token=False` can be set in `tokenizer_config.json` file or while initializing the tokenizer model. For text generation `pipeline` call method, `tokenizer_encode_kwargs={"add_special_tokens":False}` can be used.
114
+ - When `add_bos_token=False` and `add_eos_token=False` are set for the tokenizer model, prompt `""` is an invalid input for single batch inference as it doesn't contain any tokens. When padding is enabled, batched inference with prompt `""` as one of the samples causes its `input_ids` to be pad tokens and `attention_mask` to be all zeros. This edge case is handled differently for `_attn_implementation='eager'` and `'sdpa'`, resulting in different generation outputs for this prompt. Setting `add_bos_token=True`, `add_eos_token=True` or explicitly providing prompt as `"[PAD]"`, `"[START]"`, or `"[END]"` gives same output for either implementation. This issue does not affect KV-cache and G-cache.
115
 
116
  ### Via Original Implementation
117
 
generation_config.json CHANGED
@@ -3,5 +3,5 @@
3
  "bos_token_id": 2,
4
  "eos_token_id": 3,
5
  "pad_token_id": 0,
6
- "transformers_version": "4.55.2"
7
  }
 
3
  "bos_token_id": 2,
4
  "eos_token_id": 3,
5
  "pad_token_id": 0,
6
+ "transformers_version": "4.56.1"
7
  }