--- library_name: peft license: apache-2.0 base_model: Qwen/QwQ-32B-Preview tags: - generated_from_trainer datasets: - phxdev/creed model-index: - name: outputs/heisenberg-crystal results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.8.0.dev0` ```yaml adapter: lora base_model: Qwen/QwQ-32B-Preview trust_remote_code: true bf16: true dataset_processes: 64 datasets: - path: phxdev/creed type: completion field: text trust_remote_code: false streaming: true gradient_accumulation_steps: 1 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false learning_rate: 0.001 lisa_layers_attribute: model.layers lisa_enabled: true lisa_layers_fraction: 0.25 load_best_model_at_end: true load_in_4bit: false load_in_8bit: true lora_alpha: 128 lora_dropout: 0.15 lora_r: 64 lora_target_modules: - q_proj - v_proj - k_proj - o_proj - gate_proj - down_proj - up_proj lora_fan_in_fan_out: false modules_to_save: - embed_tokens - lm_head loraplus_lr_embedding: 1.0e-06 loraplus_lr_ratio: 16 lr_scheduler: cosine_with_min_lr lr_scheduler_kwargs: min_lr: 0.00001 max_prompt_len: 1024 mean_resizing_embeddings: false micro_batch_size: 1 num_epochs: 3.0 optimizer: adamw_torch # optim_args: # weight_decay: 0.05 # betas: [0.9, 0.95] # eps: 1.0e-8 output_dir: ./outputs/heisenberg-crystal pretrain_multipack_attn: true pretrain_multipack_buffer_size: 20000 qlora_sharded_model_loading: false ray_num_workers: 1 resources_per_worker: GPU: 1 resume_from_checkpoint: null sample_packing: false sample_packing_bin_size: 200 sample_packing_group_size: 100000 sample_packing_seq_len_multiplier: 1.0 save_only_model: true save_safetensors: true save_strategy: steps save_steps: 100 save_total_limit: 3 eval_strategy: steps eval_steps: 100 metric_for_best_model: loss greater_is_better: false sequence_len: 512 shuffle_merged_datasets: true skip_prepare_dataset: false strict: false train_on_inputs: false neftune_noise_alpha: 5.0 model_config: rope_scaling: type: linear factor: 1.5 dataloader_prefetch_factor: 4 dataloader_num_workers: 8 dataloader_pin_memory: true dataloader_persistent_workers: true max_grad_norm: 1.0 adam_beta2_schedule: cosine torch_compile: true torch_compile_backend: inductor trl: log_completions: true ref_model_mixup_alpha: 0.9 ref_model_sync_steps: 64 sync_ref_model: false use_vllm: false vllm_device: auto vllm_dtype: auto vllm_gpu_memory_utilization: 0.9 use_ray: false val_set_size: 0.05 warmup_steps: 100 warmup_ratio: 0.0 weight_decay: 0.05 flash_attention: true flash_attn_cross_entropy: true flash_attn_rms_norm: true flash_attn_fuse_qkv: false flash_attn_fuse_mlp: false ddp_backend: nccl ddp_broadcast_buffers: false ddp_find_unused_parameters: false tf32: true bf16_full_eval: false fp16: false # unfrozen_parameters: # - lm_head.* # - embed_tokens.* # - norm.* xformers_attention: false s2_attention: false sdp_attention: false pad_to_sequence_len: true peft_use_dora: false peft_lora_modules_to_save: null special_tokens: pad_token: <|endoftext|> deepspeed: null fsdp: null fsdp_config: null # wandb_project: heisenberg-qwen # wandb_entity: null # wandb_name: blue-crystal-run # wandb_log_model: checkpoint hub_model_id: null hub_strategy: null report_to: [] logging_strategy: steps logging_steps: 10 logging_first_step: true ```

# outputs/heisenberg-crystal This model is a fine-tuned version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) on the phxdev/creed dataset. It achieves the following results on the evaluation set: - Loss: nan ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.001 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine_with_min_lr - lr_scheduler_warmup_steps: 100 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | No log | 0.0013 | 1 | nan | | 7.8286 | 0.1259 | 100 | nan | | 7.2486 | 0.2519 | 200 | nan | | 7.2601 | 0.3778 | 300 | nan | | 8.2142 | 0.5038 | 400 | nan | | 7.1902 | 0.6297 | 500 | nan | | 6.3799 | 0.7557 | 600 | nan | | 6.7115 | 0.8816 | 700 | nan | | 6.0414 | 1.0076 | 800 | nan | | 6.428 | 1.1335 | 900 | nan | | 6.3167 | 1.2594 | 1000 | nan | | 6.0359 | 1.3854 | 1100 | nan | | 6.3701 | 1.5113 | 1200 | nan | | 6.9225 | 1.6373 | 1300 | nan | | 6.5807 | 1.7632 | 1400 | nan | | 6.8649 | 1.8892 | 1500 | nan | | 6.1397 | 2.0151 | 1600 | nan | | 5.7675 | 2.1411 | 1700 | nan | | 6.2605 | 2.2670 | 1800 | nan | | 5.8788 | 2.3929 | 1900 | nan | | 6.0279 | 2.5189 | 2000 | nan | | 6.3911 | 2.6448 | 2100 | nan | | 6.0412 | 2.7708 | 2200 | nan | | 6.0862 | 2.8967 | 2300 | nan | ### Framework versions - PEFT 0.14.0 - Transformers 4.49.0 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.21.0