Update README.md
Browse files
README.md
CHANGED
|
@@ -16,7 +16,31 @@ datasets:
|
|
| 16 |
- nohurry/Opus-4.6-Reasoning-3000x-filtered
|
| 17 |
- Jackrong/Qwen3.5-reasoning-700x
|
| 18 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
# 🌟 Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
|
| 21 |
|
| 22 |
> **Build Environment Upgrades:**
|
|
@@ -53,6 +77,8 @@ Let me analyze this request carefully:
|
|
| 53 |
.
|
| 54 |
```
|
| 55 |
|
|
|
|
|
|
|
| 56 |
## 🗺️ Training Pipeline Overview
|
| 57 |
|
| 58 |
```text
|
|
@@ -73,6 +99,7 @@ Final Model (Claude-4.6-Opus-Reasoning-Distilled,text-only)
|
|
| 73 |
|
| 74 |
> **From the test results, it is clear that different Qwen3.5 quantized models show significant differences in tool-calling capability. Among them, only the 27B model distilled with Claude Opus reasoning demonstrates stable performance.**
|
| 75 |
|
|
|
|
| 76 |
|
| 77 |
🔥**Community-tested advantages** (benchmark tests by user @sudoing on a single RTX 3090):
|
| 78 |
|
|
@@ -91,6 +118,7 @@ Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled shows significant advantages in
|
|
| 91 |
|
| 92 |
**Thanks to the community for the in-depth testing and feedback!**
|
| 93 |
|
|
|
|
| 94 |
|
| 95 |
### 🔹 Supervised Fine-Tuning (SFT)
|
| 96 |
- **Objective:** To inject high-density reasoning logic and establish a strict format for problem-solving involving an internal thinking state prior to outputting the final response.
|
|
@@ -103,7 +131,6 @@ The dataset consists of high-quality, filtered reasoning distillation data:
|
|
| 103 |
| Dataset Name | Description / Purpose |
|
| 104 |
|--------------|-----------------------|
|
| 105 |
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | Provides comprehensive Claude 4.6 Opus reasoning trajectories. |
|
| 106 |
-
| [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | Injecting high-intensity, structured reasoning instances. |
|
| 107 |
| [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |
|
| 108 |
|
| 109 |
## 🌟 Core Skills & Capabilities
|
|
|
|
| 16 |
- nohurry/Opus-4.6-Reasoning-3000x-filtered
|
| 17 |
- Jackrong/Qwen3.5-reasoning-700x
|
| 18 |
---
|
| 19 |
+
# 🌟 Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
|
| 20 |
+
|
| 21 |
+
🔥 **Update (April 5): To help beginners and enthusiasts better understand and reproduce the fine-tuning process of this model, I have prepared the complete training notebook, codebase, and a comprehensive companion PDF guide! Please check the resource links below.**
|
| 22 |
+
|
| 23 |
+
> ❤️ Special thanks to the Unsloth open-source library and @KyleHessling1 for their support.
|
| 24 |
+
|
| 25 |
+
## 📚 Resources & Guides
|
| 26 |
+
|
| 27 |
+
If you want to dive into how this model was trained, or wish to reproduce the results locally or on Colab, please visit my GitHub repository:
|
| 28 |
+
👉 **[Jackrong-llm-finetuning-guide](https://github.com/R6410418/Jackrong-llm-finetuning-guide)**
|
| 29 |
+
|
| 30 |
+
### 📥 Core Technical Document Direct Download
|
| 31 |
+
You can click the link below to directly access the complete technical manual for the Qwopus3.5 training:
|
| 32 |
+
|
| 33 |
+
* **[Qwopus3-5-27b-Colab_complete_guide_to_llm_finetuning.pdf](https://github.com/R6410418/Jackrong-llm-finetuning-guide/raw/main/Qwopus3-5-27b-Colab_complete_guide_to_llm_finetuning.pdf)**
|
| 34 |
+
* Covers the entire workflow, starting with an introduction to Google Colab and Unsloth.
|
| 35 |
+
* Details the complete pipeline with step-by-step explanations—from downloading the base model and normalizing heterogeneous data sources into a unified format, to configuring trainer hyperparameters and finally publishing to Hugging Face.
|
| 36 |
+
* Feedback is highly welcome! If you spot any shortcomings or areas for improvement, please let me know, and I will update it promptly.
|
| 37 |
|
| 38 |
+
> **A Note:**
|
| 39 |
+
> My goal in writing this guide goes beyond merely detailing a single training workflow. I want to convey a broader message: fine-tuning, post-training, and even medium-scale pre-training are not unattainable technical rituals, nor are they the exaggerated hype often packaged by social media. More often than not, all you need is a Google account, a standard laptop, and relentless curiosity.
|
| 40 |
+
>
|
| 41 |
+
> *No one starts as an expert. But every expert was once brave enough to begin.*
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
# 🌟 Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
|
| 45 |
|
| 46 |
> **Build Environment Upgrades:**
|
|
|
|
| 77 |
.
|
| 78 |
```
|
| 79 |
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
## 🗺️ Training Pipeline Overview
|
| 83 |
|
| 84 |
```text
|
|
|
|
| 99 |
|
| 100 |
> **From the test results, it is clear that different Qwen3.5 quantized models show significant differences in tool-calling capability. Among them, only the 27B model distilled with Claude Opus reasoning demonstrates stable performance.**
|
| 101 |
|
| 102 |
+
---
|
| 103 |
|
| 104 |
🔥**Community-tested advantages** (benchmark tests by user @sudoing on a single RTX 3090):
|
| 105 |
|
|
|
|
| 118 |
|
| 119 |
**Thanks to the community for the in-depth testing and feedback!**
|
| 120 |
|
| 121 |
+
---
|
| 122 |
|
| 123 |
### 🔹 Supervised Fine-Tuning (SFT)
|
| 124 |
- **Objective:** To inject high-density reasoning logic and establish a strict format for problem-solving involving an internal thinking state prior to outputting the final response.
|
|
|
|
| 131 |
| Dataset Name | Description / Purpose |
|
| 132 |
|--------------|-----------------------|
|
| 133 |
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | Provides comprehensive Claude 4.6 Opus reasoning trajectories. |
|
|
|
|
| 134 |
| [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |
|
| 135 |
|
| 136 |
## 🌟 Core Skills & Capabilities
|