---
library_name: peft
license: apache-2.0
base_model: Qwen/QwQ-32B-Preview
tags:
- generated_from_trainer
datasets:
- phxdev/creed
model-index:
- name: outputs/heisenberg-crystal
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.8.0.dev0`
```yaml
adapter: lora
base_model: Qwen/QwQ-32B-Preview
trust_remote_code: true
bf16: true
dataset_processes: 64
datasets:
- path: phxdev/creed
  type: completion
  field: text
  trust_remote_code: false
  streaming: true
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
learning_rate: 0.001
lisa_layers_attribute: model.layers
lisa_enabled: true
lisa_layers_fraction: 0.25
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: true
lora_alpha: 128
lora_dropout: 0.15
lora_r: 64
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
lora_fan_in_fan_out: false
modules_to_save:
- embed_tokens
- lm_head
loraplus_lr_embedding: 1.0e-06
loraplus_lr_ratio: 16
lr_scheduler: cosine_with_min_lr
lr_scheduler_kwargs:
  min_lr: 0.00001
max_prompt_len: 1024
mean_resizing_embeddings: false
micro_batch_size: 1
num_epochs: 3.0
optimizer: adamw_torch
# optim_args:
#   weight_decay: 0.05
#   betas: [0.9, 0.95]
#   eps: 1.0e-8
output_dir: ./outputs/heisenberg-crystal
pretrain_multipack_attn: true
pretrain_multipack_buffer_size: 20000
qlora_sharded_model_loading: false
ray_num_workers: 1
resources_per_worker:
  GPU: 1
resume_from_checkpoint: null
sample_packing: false
sample_packing_bin_size: 200
sample_packing_group_size: 100000
sample_packing_seq_len_multiplier: 1.0
save_only_model: true
save_safetensors: true
save_strategy: steps
save_steps: 100
save_total_limit: 3
eval_strategy: steps
eval_steps: 100
metric_for_best_model: loss
greater_is_better: false
sequence_len: 512
shuffle_merged_datasets: true
skip_prepare_dataset: false
strict: false
train_on_inputs: false
neftune_noise_alpha: 5.0
model_config:
  rope_scaling:
    type: linear
    factor: 1.5
dataloader_prefetch_factor: 4
dataloader_num_workers: 8
dataloader_pin_memory: true
dataloader_persistent_workers: true
max_grad_norm: 1.0
adam_beta2_schedule: cosine
torch_compile: true
torch_compile_backend: inductor
trl:
  log_completions: true
  ref_model_mixup_alpha: 0.9
  ref_model_sync_steps: 64
  sync_ref_model: false
  use_vllm: false
  vllm_device: auto
  vllm_dtype: auto
  vllm_gpu_memory_utilization: 0.9
use_ray: false
val_set_size: 0.05
warmup_steps: 100
warmup_ratio: 0.0
weight_decay: 0.05
flash_attention: true
flash_attn_cross_entropy: true
flash_attn_rms_norm: true
flash_attn_fuse_qkv: false
flash_attn_fuse_mlp: false
ddp_backend: nccl
ddp_broadcast_buffers: false
ddp_find_unused_parameters: false
tf32: true
bf16_full_eval: false
fp16: false
# unfrozen_parameters:
# - lm_head.*
# - embed_tokens.*
# - norm.*
xformers_attention: false
s2_attention: false
sdp_attention: false
pad_to_sequence_len: true
peft_use_dora: false
peft_lora_modules_to_save: null
special_tokens:
  pad_token: <|endoftext|>
deepspeed: null
fsdp: null
fsdp_config: null
# wandb_project: heisenberg-qwen
# wandb_entity: null
# wandb_name: blue-crystal-run
# wandb_log_model: checkpoint
hub_model_id: null
hub_strategy: null
report_to: []
logging_strategy: steps
logging_steps: 10
logging_first_step: true
```

</details><br>

# outputs/heisenberg-crystal

This model is a fine-tuned version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) on the phxdev/creed dataset.
It achieves the following results on the evaluation set:
- Loss: nan

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_min_lr
- lr_scheduler_warmup_steps: 100
- num_epochs: 3.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| No log        | 0.0013 | 1    | nan             |
| 7.8286        | 0.1259 | 100  | nan             |
| 7.2486        | 0.2519 | 200  | nan             |
| 7.2601        | 0.3778 | 300  | nan             |
| 8.2142        | 0.5038 | 400  | nan             |
| 7.1902        | 0.6297 | 500  | nan             |
| 6.3799        | 0.7557 | 600  | nan             |
| 6.7115        | 0.8816 | 700  | nan             |
| 6.0414        | 1.0076 | 800  | nan             |
| 6.428         | 1.1335 | 900  | nan             |
| 6.3167        | 1.2594 | 1000 | nan             |
| 6.0359        | 1.3854 | 1100 | nan             |
| 6.3701        | 1.5113 | 1200 | nan             |
| 6.9225        | 1.6373 | 1300 | nan             |
| 6.5807        | 1.7632 | 1400 | nan             |
| 6.8649        | 1.8892 | 1500 | nan             |
| 6.1397        | 2.0151 | 1600 | nan             |
| 5.7675        | 2.1411 | 1700 | nan             |
| 6.2605        | 2.2670 | 1800 | nan             |
| 5.8788        | 2.3929 | 1900 | nan             |
| 6.0279        | 2.5189 | 2000 | nan             |
| 6.3911        | 2.6448 | 2100 | nan             |
| 6.0412        | 2.7708 | 2200 | nan             |
| 6.0862        | 2.8967 | 2300 | nan             |


### Framework versions

- PEFT 0.14.0
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0