Text Generation
Transformers
Safetensors
PyTorch
English
pldrllm
large-language-model
power-law-decoder-representations
power-law-graph-attention
pldr-llm
kv-cache
g-cache
kvg-cache
custom_code
Instructions to use fromthesky/PLDR-LLM-v51-110M-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fromthesky/PLDR-LLM-v51-110M-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="fromthesky/PLDR-LLM-v51-110M-1", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("fromthesky/PLDR-LLM-v51-110M-1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use fromthesky/PLDR-LLM-v51-110M-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "fromthesky/PLDR-LLM-v51-110M-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fromthesky/PLDR-LLM-v51-110M-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/fromthesky/PLDR-LLM-v51-110M-1
- SGLang
How to use fromthesky/PLDR-LLM-v51-110M-1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "fromthesky/PLDR-LLM-v51-110M-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fromthesky/PLDR-LLM-v51-110M-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "fromthesky/PLDR-LLM-v51-110M-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fromthesky/PLDR-LLM-v51-110M-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use fromthesky/PLDR-LLM-v51-110M-1 with Docker Model Runner:
docker model run hf.co/fromthesky/PLDR-LLM-v51-110M-1
Commit ·
637fb33
1
Parent(s): 2aed2e6
Updated readme
Browse filesUpdated transformers version in generation_config.json
- README.md +10 -8
- generation_config.json +1 -1
README.md
CHANGED
|
@@ -52,9 +52,11 @@ pipeline = pipeline(
|
|
| 52 |
trust_remote_code=True
|
| 53 |
)
|
| 54 |
|
| 55 |
-
prompt=
|
| 56 |
-
|
| 57 |
-
output=pipeline(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True,
|
|
|
|
|
|
|
| 58 |
print(output[0]["generated_text"])
|
| 59 |
```
|
| 60 |
|
|
@@ -71,9 +73,9 @@ tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path="fromthesk
|
|
| 71 |
legacy=False,
|
| 72 |
trust_remote_code=True
|
| 73 |
)
|
| 74 |
-
|
| 75 |
-
prompt=
|
| 76 |
-
|
| 77 |
inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
|
| 78 |
generated_ids = model.generate(**inputs,
|
| 79 |
max_new_tokens=100,
|
|
@@ -85,7 +87,6 @@ generated_ids = model.generate(**inputs,
|
|
| 85 |
)
|
| 86 |
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
|
| 87 |
```
|
| 88 |
-
<sup>\*</sup> `prompt` string is a quote from Richard Feynman in Surely You're Joking, Mr. Feynman! Adventures of a Curious Character.
|
| 89 |
|
| 90 |
#### PLDR-LLM specific configurations:
|
| 91 |
- `custom_G_type`: `None` for learned G values during pretraining, `'identity'` for LLM with SDPA equivalent, `'random'` for G values from a random normal distribution, `'external'` for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
|
|
@@ -109,7 +110,8 @@ See config.json for other model configuration details.
|
|
| 109 |
pip install -e ".[dev]"
|
| 110 |
```
|
| 111 |
- Static cache is not supported for models with `custom_G_type=None`.
|
| 112 |
-
-
|
|
|
|
| 113 |
|
| 114 |
### Via Original Implementation
|
| 115 |
|
|
|
|
| 52 |
trust_remote_code=True
|
| 53 |
)
|
| 54 |
|
| 55 |
+
prompt="The quick brown fox jumps over the lazy dog."
|
| 56 |
+
|
| 57 |
+
output=pipeline(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True,
|
| 58 |
+
tokenizer_encode_kwargs={"add_special_tokens":False},
|
| 59 |
+
use_cache=True, max_new_tokens=100)
|
| 60 |
print(output[0]["generated_text"])
|
| 61 |
```
|
| 62 |
|
|
|
|
| 73 |
legacy=False,
|
| 74 |
trust_remote_code=True
|
| 75 |
)
|
| 76 |
+
|
| 77 |
+
prompt="The quick brown fox jumps over the lazy dog."
|
| 78 |
+
|
| 79 |
inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
|
| 80 |
generated_ids = model.generate(**inputs,
|
| 81 |
max_new_tokens=100,
|
|
|
|
| 87 |
)
|
| 88 |
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
|
| 89 |
```
|
|
|
|
| 90 |
|
| 91 |
#### PLDR-LLM specific configurations:
|
| 92 |
- `custom_G_type`: `None` for learned G values during pretraining, `'identity'` for LLM with SDPA equivalent, `'random'` for G values from a random normal distribution, `'external'` for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
|
|
|
|
| 110 |
pip install -e ".[dev]"
|
| 111 |
```
|
| 112 |
- Static cache is not supported for models with `custom_G_type=None`.
|
| 113 |
+
- PLDR-LLM uses EOS token `"[END]"` during pretraining to indicate end of a sequence. For text generation, we do not need to add the EOS token to the prompt. To achieve this, `add_eos_token=False` can be set in `tokenizer_config.json` file or while initializing the tokenizer model. For text generation `pipeline` call method, `tokenizer_encode_kwargs={"add_special_tokens":False}` can be used.
|
| 114 |
+
- When `add_bos_token=False` and `add_eos_token=False` are set for the tokenizer model, prompt `""` is an invalid input for single batch inference as it doesn't contain any tokens. When padding is enabled, batched inference with prompt `""` as one of the samples causes its `input_ids` to be pad tokens and `attention_mask` to be all zeros. This edge case is handled differently for `_attn_implementation='eager'` and `'sdpa'`, resulting in different generation outputs for this prompt. Setting `add_bos_token=True`, `add_eos_token=True` or explicitly providing prompt as `"[PAD]"`, `"[START]"`, or `"[END]"` gives same output for either implementation. This issue does not affect KV-cache and G-cache.
|
| 115 |
|
| 116 |
### Via Original Implementation
|
| 117 |
|
generation_config.json
CHANGED
|
@@ -3,5 +3,5 @@
|
|
| 3 |
"bos_token_id": 2,
|
| 4 |
"eos_token_id": 3,
|
| 5 |
"pad_token_id": 0,
|
| 6 |
-
"transformers_version": "4.
|
| 7 |
}
|
|
|
|
| 3 |
"bos_token_id": 2,
|
| 4 |
"eos_token_id": 3,
|
| 5 |
"pad_token_id": 0,
|
| 6 |
+
"transformers_version": "4.56.1"
|
| 7 |
}
|