minimind-63M-full-sft-Junhan

This repository contains a 63.9M-parameter dense MiniMind chat model converted to a Transformers-compatible checkpoint for easy loading with transformers.

Model Summary

  • Architecture: dense decoder-only causal LM
  • Exported architecture name: Qwen3ForCausalLM
  • Original training codebase: MiniMind
  • Parameters: 63.9M
  • Hidden size: 768
  • Layers: 8
  • Attention heads: 8
  • KV heads: 4
  • Vocab size: 6400
  • Max position embeddings: 32768
  • RoPE theta: 1e6
  • MoE: no
  • Checkpoint type: full-parameter SFT

This model was trained from a MiniMind pretraining checkpoint and then fully fine-tuned on the MiniMind SFT pipeline. The exported folder was produced from the local full_sft_768.pth checkpoint using scripts/convert_model.py.

Training Notes

  • Base training pipeline: MiniMind
  • SFT training script: trainer/train_full_sft.py
  • SFT data used locally: sft_t2t_mini.jsonl
  • Typical SFT sequence length in this setup: max_seq_len=768

The upstream MiniMind SFT data mixes general instruction-following samples with some tool-calling and reasoning-style samples. As a result, this checkpoint is mainly a lightweight chat model, not a specialized tool-use or reasoning model.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "YOUR_USERNAME/minimind-63M-full-sft-Junhan"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "你好,介绍一下你自己。"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Intended Use

  • Lightweight chat experiments
  • Small-model SFT baselines
  • Educational and debugging purposes
  • Simple local inference and deployment tests

Limitations

  • This is a very small model, so factuality, planning, and reasoning ability are limited.
  • Tool-use style may appear in some responses, but robustness is limited.
  • The model is not suitable for high-stakes medical, legal, financial, or safety-critical use.
  • The training mixture includes distilled or synthetic components, so behavior may inherit teacher-model style artifacts.

Source

License

This model card uses cc-by-nc-4.0 conservatively because the upstream MiniMind dataset documentation mentions mixed source licenses, including non-commercial terms in parts of the training pipeline. Review your exact data provenance before using or relicensing this model for commercial scenarios.

Downloads last month
323
Safetensors
Model size
63.9M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support