docs: refresh 27B model card with latest deployment-backed eval and loss plots
Browse files- README.md +44 -16
- training_loss_vs_progress.svg +32 -127
README.md
CHANGED
|
@@ -102,35 +102,63 @@ Use this full fiction passage for held-out testing:
|
|
| 102 |
> She stepped forward.
|
| 103 |
> The mountain exhaled.
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
## Full observed output on that sample
|
| 106 |
|
| 107 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
## Judgment
|
| 116 |
|
| 117 |
-
Blunt judgment: this 27B run is
|
| 118 |
|
| 119 |
Why:
|
| 120 |
-
-
|
| 121 |
-
-
|
| 122 |
-
-
|
| 123 |
-
-
|
| 124 |
|
| 125 |
## Comparison vs pilot series
|
| 126 |
|
| 127 |
-
- **0.
|
| 128 |
-
- **
|
| 129 |
-
- **
|
| 130 |
-
- **9B (retrained)**: strong practical baseline in the new family; often cleaner and more controlled on medium-length rewrites
|
| 131 |
-
- **30B-A3B VL Instruct**: still the safest fidelity-first reference on long-form passages
|
| 132 |
-
- **Qwen3.5 27B (this run)**: stronger language quality than smaller dense lanes, but currently trends more stylistic/aggressive than ideal in strict-preservation mode
|
| 133 |
|
| 134 |
## Conclusion
|
| 135 |
|
| 136 |
-
This repo is
|
|
|
|
| 102 |
> She stepped forward.
|
| 103 |
> The mountain exhaled.
|
| 104 |
|
| 105 |
+
## Deployment-backed endpoint check (latest)
|
| 106 |
+
|
| 107 |
+
Live endpoint:
|
| 108 |
+
- `https://dmitriy-kisil--qwen3-5-27b-unslop-api.modal.run`
|
| 109 |
+
|
| 110 |
+
Latest `/health` (deployment-backed):
|
| 111 |
+
- `ok: true`
|
| 112 |
+
- `model_id: Oysiyl/qwen3.5-27b-unslop-good-lora-v1`
|
| 113 |
+
- `base_model: Qwen/Qwen3.5-27B`
|
| 114 |
+
- `model_family: qwen3_5`
|
| 115 |
+
- `adapter_mode: false`
|
| 116 |
+
- `artifact_mode: merged_full_model`
|
| 117 |
+
- `scaledown_window: 240`
|
| 118 |
+
|
| 119 |
+
Notes:
|
| 120 |
+
- This endpoint is currently loading the merged full-model artifact (not PEFT adapter-serving mode).
|
| 121 |
+
- The service is up and returning 200s, but rewrite quality is currently corrupted.
|
| 122 |
+
|
| 123 |
## Full observed output on that sample
|
| 124 |
|
| 125 |
+
Short sanity sample (input):
|
| 126 |
+
|
| 127 |
+
> This feature saves teams hours every week, but the copy still sounds robotic and bland.
|
| 128 |
+
|
| 129 |
+
Short sanity sample (observed output):
|
| 130 |
+
|
| 131 |
+
> This can... ...SPOINTF0610! $55) ( ),<#%}·' -/ #1. }:*
|
| 132 |
+
|
| 133 |
+
Held-out-style rewrite sample (input):
|
| 134 |
+
|
| 135 |
+
> We built this feature quickly, but now we need a cleaner version that sounds natural without losing any technical detail.
|
| 136 |
|
| 137 |
+
Held-out-style rewrite sample (observed output):
|
| 138 |
|
| 139 |
+
> The *>\n: 7%\n, (⊻�,\n. " " \n*\n: "、\n„ " ;*\n\n\n\n;\n\n; :: 5:17-
|
| 140 |
|
| 141 |
+
Observed behavior note:
|
| 142 |
+
- output is not a coherent rewrite right now
|
| 143 |
+
- character-level corruption/gibberish is still present
|
| 144 |
+
- this is a runtime-quality issue, not just a model-card wording issue
|
| 145 |
|
| 146 |
## Judgment
|
| 147 |
|
| 148 |
+
Blunt judgment: this 27B run is training-successful and deployment-live, but current live rewrite quality is not production-usable.
|
| 149 |
|
| 150 |
Why:
|
| 151 |
+
- endpoint health is green and requests return 200
|
| 152 |
+
- model artifact is correctly identified as merged full model (`adapter_mode: false` is expected)
|
| 153 |
+
- observed outputs are still corrupted / gibberish under live inference
|
| 154 |
+
- therefore quality remains blocked even though infra is operational
|
| 155 |
|
| 156 |
## Comparison vs pilot series
|
| 157 |
|
| 158 |
+
- **0.8B / 2B / 4B / 9B lanes:** have produced coherent rewrite outputs in the documented runs.
|
| 159 |
+
- **30B-A3B reference lane:** remains the safer quality-first benchmark for fidelity-sensitive long-form use.
|
| 160 |
+
- **Qwen3.5 27B (this run):** strongest training/infrastructure completion so far at this size, but currently failing the deployment-backed coherence gate.
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
## Conclusion
|
| 163 |
|
| 164 |
+
This repo is a complete 27B post-run artifact with real training metrics, normalized loss plot, and live deployment checks. Current status is explicit: infra is up, artifact type is correct (merged model, not adapter), but the latest deployment-backed rewrite outputs are still corrupted. The next milestone is not more training documentation; it is restoring coherent live generation and then re-running this evaluation block.
|
training_loss_vs_progress.svg
CHANGED
|
|
|
|