YiyingXie commited on
Commit
b4340f5
·
verified ·
1 Parent(s): a14e043

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/gemma-3-270m
4
+ tags:
5
+ - alignment
6
+ - dpo
7
+ - triton
8
+ - weight-analysis
9
+ ---
10
+
11
+ # Gemma-3-270m-IT-DPO (Weight Delta Analysis)
12
+
13
+ This model is a fine-tuned version of [Gemma-3-270m](https://huggingface.co/google/gemma-3-270m) using **Direct Preference Optimization (DPO)**. Beyond standard alignment, this repository explores structural weight analysis through mask generation from weight deltas.
14
+
15
+ ## 🛠 Technical Features
16
+
17
+ ### 1. Weight Delta Masking
18
+ The repository includes tools to build binary masks from weight delta logs to identify the most significant parameter changes during DPO.
19
+ * **Methods Supported:** `Magnitude`, `Momentum`, and `Fisher`.
20
+ * **Comparison Logic:** Includes a Jaccard/IoU (Intersection over Union) method to compare generated masks against the default magnitude mask.
21
+ * **Score 0:** No similarity.
22
+ * **Score 1:** Perfect similarity.
23
+ * **Storage:** All generated masks are saved as `.pt` files in the `/masks` directory.
24
+
25
+ ### 2. Optimized Training Kernels
26
+ To ensure maximum efficiency on high-end compute, the training environment utilizes:
27
+ * **BSR-AdamW Kernel:** A specialized Triton-based optimizer kernel for DPO.
28
+ * **Hardware Compatibility:** Verified for **NVIDIA H100/H200** GPUs.
29
+ * **Triton Validation:** Environment readiness can be tested with a short run (50-100 steps), typically taking only a few minutes on H-series hardware.
30
+
31
+ ## 📂 Repository Structure
32
+
33
+ * `/final_checkpoint`: The weights of the DPO-tuned model.
34
+ * `/masks`: Contains `.pt` mask files generated using the methods mentioned above.
35
+
36
+ ## 🚀 Reproduction & Debugging
37
+
38
+ If you are running the mask generation or training scripts:
39
+ * **Jaccard Flag:** If the Jaccard/IoU comparison breaks during execution, it is recommended to disable the `--debug/jaccard` flag temporarily. A fix is scheduled for the upcoming weekend.
40
+ * **Environment Check:** Ensure `triton` is properly installed to handle the BSR-AdamW kernel.
41
+
42
+ ## 📝 Usage Note
43
+ This model is part of an ongoing research project into how DPO shifts model weights. Results from the Jaccard similarity analysis can be used to interpret which parameters are most "critical" for preference alignment.