Oysiyl
/

qwen3.5-27b-unslop-good-lora-v1

@@ -102,35 +102,63 @@ Use this full fiction passage for held-out testing:
 > She stepped forward.
 > The mountain exhaled.
 ## Full observed output on that sample
-Live Modal short sanity sample observed output:
-> This tool saves teams hours every week, yet the copy still sounds mechanical and bland.
-Observed output from held-out eval run:
-> The wind picked up, sudden and violent, as if the mountain had drawn a sharp breath. She stood at the lip of the pass, her cloak whipping behind her like a flag of surrender, looking down into the valley where the last lights were flickering out one by one. Not blown out. Extinguished. As if someone had reached down and snuffed them. Below, the bells started to ring—not the clear peal of morning, but the slow, heavy toll of evening. They already knew. “Don’t,” he said, his voice barely audible over the rising gale. He reached for her, but his hand passed through the air where her shoulder should have been. “If you step over that ridge, you’re dead.” She didn’t look back. The cold bit into her cheeks, but her eyes were steady. “I’ve been dead since I left you,” she said quietly. “I’m just finishing the job now.” Thunder cracked overhead, close enough to shake the ground beneath their feet. For a second, lightning split the sky, illuminating her face—a mask of resolve carved from grief and fury. Not the girl he’d known, but something harder, sharper, forged in fire. “You’ll kill yourself,” he shouted. She turned then, meeting his gaze with a smile that didn’t reach her eyes. “Maybe,” she said. “But I’ll take some of them with me.” And with that, she stepped forward. The mountain roared.
 ## Judgment
-Blunt judgment: this 27B run is operationally successful and qualitatively strong, but not yet a strict-fidelity winner.
 Why:
-- output is fluent and coherent end-to-end
-- it preserves core scene structure and emotional trajectory
-- it still injects extra aggression/drama in places (for example, more forceful phrasing and added intent)
-- that means the rewrite can be too interventionist for strict minimal-touch unslop behavior
 ## Comparison vs pilot series
-- **0.6B**: failed badly; became a different story
-- **1.7B**: more fluent than 0.6B, but still invented scenes and structure
-- **4B**: first clearly improved text-only model in the series; mostly kept the scene intact, but still drifted and over-shaped the prose
-- **9B (retrained)**: strong practical baseline in the new family; often cleaner and more controlled on medium-length rewrites
-- **30B-A3B VL Instruct**: still the safest fidelity-first reference on long-form passages
-- **Qwen3.5 27B (this run)**: stronger language quality than smaller dense lanes, but currently trends more stylistic/aggressive than ideal in strict-preservation mode
 ## Conclusion
-This repo is now a completed post-run artifact for Qwen 3.5 27B with real training metrics, normalized loss curve, and held-out sample output. The core result: 27B clearly works and writes well, but current behavior still needs tighter rewrite constraints before it can replace the 30B reference lane for fidelity-sensitive unslop. It is a serious positive experiment, not a final default.

 > She stepped forward.
 > The mountain exhaled.
+## Deployment-backed endpoint check (latest)
+Live endpoint:
+- `https://dmitriy-kisil--qwen3-5-27b-unslop-api.modal.run`
+Latest `/health` (deployment-backed):
+- `ok: true`
+- `model_id: Oysiyl/qwen3.5-27b-unslop-good-lora-v1`
+- `base_model: Qwen/Qwen3.5-27B`
+- `model_family: qwen3_5`
+- `adapter_mode: false`
+- `artifact_mode: merged_full_model`
+- `scaledown_window: 240`
+Notes:
+- This endpoint is currently loading the merged full-model artifact (not PEFT adapter-serving mode).
+- The service is up and returning 200s, but rewrite quality is currently corrupted.
 ## Full observed output on that sample
+Short sanity sample (input):
+> This feature saves teams hours every week, but the copy still sounds robotic and bland.
+Short sanity sample (observed output):
+> This can... ...SPOINTF0610! $55) ( ），<#%}·'  -/   #1.  }:*
+Held-out-style rewrite sample (input):
+> We built this feature quickly, but now we need a cleaner version that sounds natural without losing any technical detail.
+Held-out-style rewrite sample (observed output):
+> The *>\n:  7%\n,  (⊻�,\n. " " \n*\n: "、\n„ " ；*\n\n\n\n;\n\n; ：: 5:17-
+Observed behavior note:
+- output is not a coherent rewrite right now
+- character-level corruption/gibberish is still present
+- this is a runtime-quality issue, not just a model-card wording issue
 ## Judgment
+Blunt judgment: this 27B run is training-successful and deployment-live, but current live rewrite quality is not production-usable.
 Why:
+- endpoint health is green and requests return 200
+- model artifact is correctly identified as merged full model (`adapter_mode: false` is expected)
+- observed outputs are still corrupted / gibberish under live inference
+- therefore quality remains blocked even though infra is operational
 ## Comparison vs pilot series
+- **0.8B / 2B / 4B / 9B lanes:** have produced coherent rewrite outputs in the documented runs.
+- **30B-A3B reference lane:** remains the safer quality-first benchmark for fidelity-sensitive long-form use.
+- **Qwen3.5 27B (this run):** strongest training/infrastructure completion so far at this size, but currently failing the deployment-backed coherence gate.
 ## Conclusion
+This repo is a complete 27B post-run artifact with real training metrics, normalized loss plot, and live deployment checks. Current status is explicit: infra is up, artifact type is correct (merged model, not adapter), but the latest deployment-backed rewrite outputs are still corrupted. The next milestone is not more training documentation; it is restoring coherent live generation and then re-running this evaluation block.

training_loss_vs_progress.svg CHANGED Viewed