Qwen3-1.7B Backward LoRA Adapter

This repository contains the LoRA adapter trained for Assignment 3, Part 1: a backward model p(x|y) that predicts the original user instruction from an assistant response.

Model Details

Base model: Qwen/Qwen3-1.7B
Adapter type: LoRA (peft)
Task: backward instruction generation (response -> instruction)
Training objective: given an assistant response, generate the user instruction that most likely produced it
Finetuned from: Qwen/Qwen3-1.7B

Intended Use

This adapter is intended for coursework reproduction of the paper Self-Alignment with Instruction Backtranslation. It is designed to be used as the backward model in the assignment pipeline:

Train p(x|y) on seed instruction-response data.
Use the backward model to infer instructions from LIMA responses.
Curate synthetic pairs.
Train the final forward instruction-following model.

This model is not intended as a general-purpose assistant.

Training Data

Training data came from timdettmers/openassistant-guanaco.

Because this dataset stores conversations in a single text field, preprocessing extracted all complete:

### Human: ...
### Assistant: ...

pairs from each conversation and converted them into backward samples:

input / condition: assistant response y
target: user instruction x

Final dataset sizes:

Usable backward examples: 13,436
Train split: 13,180
Eval split: 256

Training Procedure

Prompt Format

Each training sample used a causal LM prompt with masked loss on the instruction only:

System: You are a reverse instruction generation model.

Given the following assistant response, generate the original user instruction that most likely produced it.

Assistant response:
<response>

Recovered user instruction:
<instruction>

The loss is applied only to the <instruction> portion.

Hyperparameters

Precision: bf16
Epochs: 2
Max sequence length: 1024
Learning rate: 1e-4
Per-device train batch size: 4
Per-device eval batch size: 4
Gradient accumulation steps: 4
Warmup ratio: 0.03
Weight decay: 0.0
Gradient checkpointing: true

LoRA Configuration

r: 16
alpha: 32
dropout: 0.05
target modules: q_proj, k_proj, v_proj, o_proj

Results

Training completed successfully.

Total train steps: 1648
Final train loss: 1.7289
Best observed eval loss region: about 1.6770 - 1.6896
Runtime: about 57.8 minutes

The model learns the backward mapping and is suitable for generating candidate instructions for the next stage of the assignment pipeline.

Limitations

Output quality depends heavily on the assistant response quality and prompt format.
The model may produce verbose or partially repeated instructions on some inputs.
This adapter is optimized for coursework reproduction rather than production deployment.

How to Load

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "Qwen/Qwen3-1.7B"
adapter_path = "path/to/this/adapter"

tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_path)
model.eval()

Example Prompt

System: You are a reverse instruction generation model.

Given the following assistant response, generate the original user instruction that most likely produced it.

Assistant response:
<assistant response here>

Recovered user instruction:

License

Use of this adapter should follow the license and usage terms of the base model Qwen/Qwen3-1.7B and the underlying training data.

Downloads last month: 9

Model tree for sunming-giegie/assignment3-qwen3-1.7b-backward-lora

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(438)

this model