Oysiyl commited on
Commit
65e042e
·
verified ·
1 Parent(s): b17b88d

docs: refresh 27B model card with latest deployment-backed eval and loss plots

Browse files
Files changed (2) hide show
  1. README.md +44 -16
  2. training_loss_vs_progress.svg +32 -127
README.md CHANGED
@@ -102,35 +102,63 @@ Use this full fiction passage for held-out testing:
102
  > She stepped forward.
103
  > The mountain exhaled.
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ## Full observed output on that sample
106
 
107
- Live Modal short sanity sample observed output:
 
 
 
 
 
 
 
 
 
 
108
 
109
- > This tool saves teams hours every week, yet the copy still sounds mechanical and bland.
110
 
111
- Observed output from held-out eval run:
112
 
113
- > The wind picked up, sudden and violent, as if the mountain had drawn a sharp breath. She stood at the lip of the pass, her cloak whipping behind her like a flag of surrender, looking down into the valley where the last lights were flickering out one by one. Not blown out. Extinguished. As if someone had reached down and snuffed them. Below, the bells started to ring—not the clear peal of morning, but the slow, heavy toll of evening. They already knew. “Don’t,” he said, his voice barely audible over the rising gale. He reached for her, but his hand passed through the air where her shoulder should have been. “If you step over that ridge, you’re dead.” She didn’t look back. The cold bit into her cheeks, but her eyes were steady. “I’ve been dead since I left you,” she said quietly. “I’m just finishing the job now.” Thunder cracked overhead, close enough to shake the ground beneath their feet. For a second, lightning split the sky, illuminating her face—a mask of resolve carved from grief and fury. Not the girl he’d known, but something harder, sharper, forged in fire. “You’ll kill yourself,” he shouted. She turned then, meeting his gaze with a smile that didn’t reach her eyes. “Maybe,” she said. “But I’ll take some of them with me.” And with that, she stepped forward. The mountain roared.
 
 
 
114
 
115
  ## Judgment
116
 
117
- Blunt judgment: this 27B run is operationally successful and qualitatively strong, but not yet a strict-fidelity winner.
118
 
119
  Why:
120
- - output is fluent and coherent end-to-end
121
- - it preserves core scene structure and emotional trajectory
122
- - it still injects extra aggression/drama in places (for example, more forceful phrasing and added intent)
123
- - that means the rewrite can be too interventionist for strict minimal-touch unslop behavior
124
 
125
  ## Comparison vs pilot series
126
 
127
- - **0.6B**: failed badly; became a different story
128
- - **1.7B**: more fluent than 0.6B, but still invented scenes and structure
129
- - **4B**: first clearly improved text-only model in the series; mostly kept the scene intact, but still drifted and over-shaped the prose
130
- - **9B (retrained)**: strong practical baseline in the new family; often cleaner and more controlled on medium-length rewrites
131
- - **30B-A3B VL Instruct**: still the safest fidelity-first reference on long-form passages
132
- - **Qwen3.5 27B (this run)**: stronger language quality than smaller dense lanes, but currently trends more stylistic/aggressive than ideal in strict-preservation mode
133
 
134
  ## Conclusion
135
 
136
- This repo is now a completed post-run artifact for Qwen 3.5 27B with real training metrics, normalized loss curve, and held-out sample output. The core result: 27B clearly works and writes well, but current behavior still needs tighter rewrite constraints before it can replace the 30B reference lane for fidelity-sensitive unslop. It is a serious positive experiment, not a final default.
 
102
  > She stepped forward.
103
  > The mountain exhaled.
104
 
105
+ ## Deployment-backed endpoint check (latest)
106
+
107
+ Live endpoint:
108
+ - `https://dmitriy-kisil--qwen3-5-27b-unslop-api.modal.run`
109
+
110
+ Latest `/health` (deployment-backed):
111
+ - `ok: true`
112
+ - `model_id: Oysiyl/qwen3.5-27b-unslop-good-lora-v1`
113
+ - `base_model: Qwen/Qwen3.5-27B`
114
+ - `model_family: qwen3_5`
115
+ - `adapter_mode: false`
116
+ - `artifact_mode: merged_full_model`
117
+ - `scaledown_window: 240`
118
+
119
+ Notes:
120
+ - This endpoint is currently loading the merged full-model artifact (not PEFT adapter-serving mode).
121
+ - The service is up and returning 200s, but rewrite quality is currently corrupted.
122
+
123
  ## Full observed output on that sample
124
 
125
+ Short sanity sample (input):
126
+
127
+ > This feature saves teams hours every week, but the copy still sounds robotic and bland.
128
+
129
+ Short sanity sample (observed output):
130
+
131
+ > This can... ...SPOINTF0610! $55) ( ),<#%}·' -/ #1. }:*
132
+
133
+ Held-out-style rewrite sample (input):
134
+
135
+ > We built this feature quickly, but now we need a cleaner version that sounds natural without losing any technical detail.
136
 
137
+ Held-out-style rewrite sample (observed output):
138
 
139
+ > The *>\n: 7%\n, (⊻�,\n. " " \n*\n: "、\n„ " ;*\n\n\n\n;\n\n; :: 5:17-
140
 
141
+ Observed behavior note:
142
+ - output is not a coherent rewrite right now
143
+ - character-level corruption/gibberish is still present
144
+ - this is a runtime-quality issue, not just a model-card wording issue
145
 
146
  ## Judgment
147
 
148
+ Blunt judgment: this 27B run is training-successful and deployment-live, but current live rewrite quality is not production-usable.
149
 
150
  Why:
151
+ - endpoint health is green and requests return 200
152
+ - model artifact is correctly identified as merged full model (`adapter_mode: false` is expected)
153
+ - observed outputs are still corrupted / gibberish under live inference
154
+ - therefore quality remains blocked even though infra is operational
155
 
156
  ## Comparison vs pilot series
157
 
158
+ - **0.8B / 2B / 4B / 9B lanes:** have produced coherent rewrite outputs in the documented runs.
159
+ - **30B-A3B reference lane:** remains the safer quality-first benchmark for fidelity-sensitive long-form use.
160
+ - **Qwen3.5 27B (this run):** strongest training/infrastructure completion so far at this size, but currently failing the deployment-backed coherence gate.
 
 
 
161
 
162
  ## Conclusion
163
 
164
+ This repo is a complete 27B post-run artifact with real training metrics, normalized loss plot, and live deployment checks. Current status is explicit: infra is up, artifact type is correct (merged model, not adapter), but the latest deployment-backed rewrite outputs are still corrupted. The next milestone is not more training documentation; it is restoring coherent live generation and then re-running this evaluation block.
training_loss_vs_progress.svg CHANGED