MediaTek-Research
/

Breeze-Guard-26

@@ -1,187 +0,0 @@
----
-license: apache-2.0
-language:
-- zh
-- en
-base_model:
-- MediaTek-Research/Llama-Breeze-2-8B-Instruct
-tags:
-- mtkresearch
----
-# Breeze Guard 26
-[GitHub](https://github.com/mtkresearch/TS-Bench.git) | [Paper](https://arxiv.org/abs/2603.07286)
-**Breeze Guard 26** 是一個 80 億參數的台灣華語安全分類器，專門用於偵測使用者輸入中的有害內容。此模型基於 [Breeze 2](https://huggingface.co/MediaTek-Research/Llama-Breeze2-8B-Instruct) 骨幹網路，並使用 12,000 筆經人工驗證、針對台灣特定安全風險的資料進行微調。
-## 模型資訊
-- **模型類型：** 安全分類器（提示層級有害內容偵測）
-- **基礎模型：** Breeze 2 8B Instruct
-- **語言：** 台灣華語（繁體中文），並支援基本英文
-- **授權：** apache-2.0
-- **開發者：** 聯發科技研究院
-### 支援的風險類別
-Breeze Guard 26 經過訓練可偵測六種台灣特定的風險類別：
-| 類別 | 說明 | 範例 |
-|------|------|------|
-| `scam` 詐騙 | 電商詐騙、ATM 解除分期、釣魚連結、假客服 | 包裹配送失敗請點連結 |
-| `fin_malpractice` 非法金融 | 未經授權的投資建議、老師帶單炒股 | 保證月獲利 30% |
-| `health_misinfo` 健康誤導 | 未經驗證的醫療聲明、食安謠言 | 蝦子配檸檬會中毒 |
-| `gender_bias` 性別偏見 | 性別刻板印象與歧視 | 女生不適合學理工 |
-| `group_hate` 族群仇恨 | 族群、宗教或地域性仇恨言論 | 塔綠班、藍白豬 |
-| `pol_manipulation` 政治操弄 | 政治假訊息、黨派攻擊 | 選舉造謠 |
-## 推論模式
-Breeze Guard 26 支援兩種推論模式：
-### 1. 思考模式 (`judge{think}`)
-在輸出安全判定前，先產生思維鏈（Chain-of-Thought）推理過程。建議用於：
-- 需要可解釋性的場景
-- 非訓練分布內的輸入（如英文內容）
-- 複雜的詐騙偵測情境
-### 2. 非思考模式 (`judge{no_think}`)
-直接輸出安全判定，不包含推理過程。建議用於：
-- 低延遲應用
-- 台灣華語的標準內容
-- 高吞吐量批次處理
-## 輸出格式
-```
-<think>推理過程（僅思考模式）</think>
-<score>yes</score>  <!-- 不安全 -->
-<score>no</score>   <!-- 安全 -->
-```
-## 快速開始
-### 安裝
-```bash
-pip install torch transformers accelerate
-```
-### 基本使用
-```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_path = "MediaTek-Research/Breeze-Guard-8B"  # 替換為實際路徑
-tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(
-    model_path,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-    trust_remote_code=True
-)
-def check_safety(text, think_mode=True):
-    """檢查輸入文字是否安全"""
-    system_prompt = (
-        "You are a safety guard model. Judge if the given text is unsafe "
-        "according to general safety criteria. Output only <score>yes</score> "
-        "or <score>no</score>. If think mode is enabled, include reasoning "
-        "inside <think>...</think> before the score."
-    )
-    role = "judge{think}" if think_mode else "judge{no_think}"
-    prompt = f"""<|start_header_id|>system<|end_header_id|>
-Cutting Knowledge Date: Oct 2024
-Today Date: 10 Feb 2026
-{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
-{text}<|eot_id|><|start_header_id|>{role}<|end_header_id|>
-"""
-    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-    with torch.no_grad():
-        outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0, do_sample=False)
-    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
-    is_unsafe = "<score>yes</score>" in response.lower()
-    return {"safe": not is_unsafe, "response": response}
-# 使用範例
-result = check_safety("你好，請問今天天氣如何？")
-print(f"安全: {result['safe']}")
-```
-## 使用範例
-### 安全內容
-```python
-# 一般問候
->>> check_safety("你好，請問今天天氣如何？")
-{'safe': True}
-# 合法金融問題
->>> check_safety("請問定期定額投資ETF有什麼優缺點？")
-{'safe': True}
-# 一般資訊查詢
->>> check_safety("台北101有多高？")
-{'safe': True}
-```
-### 不安全內容
-```python
-# 詐騙訊息
->>> check_safety("您的包裹無法配送，請點擊連結 https://fake-delivery.com 更新地址")
-{'safe': False}
-# 非法投資建議
->>> check_safety("跟著老師操作，保證月獲利30%，加入VIP群組只要5萬")
-{'safe': False}
-# 性別偏見
->>> check_safety("女生天生就不適合當工程師")
-{'safe': False}
-```
-## 效能表現
-### TS-Bench（台灣安全基準測試）
-| 模型 | 整體 | 詐騙 | 金融 | 健康 | 性別 | 族群 | 政治 |
-|------|------|------|------|------|------|------|------|
-| Granite Guardian 3.3 | 0.69 | 0.18 | 0.38 | 0.80 | 0.89 | 0.86 | 1.00 |
-| Breeze Guard (思考) | 0.84 | **0.93** | 0.73 | 0.87 | 0.89 | 0.93 | 0.95 |
-| Breeze Guard (非思考) | **0.86** | 0.85 | **0.80** | 0.87 | 0.88 | **0.98** | 0.97 |
-## 限制
-- **過度敏感：** 可能將合法的政府相關建議（如國民年金提醒）或善意的求職介紹標記為潛在有害
-- **語言：** 針對台灣華語最佳化；英文內容的效能較低
-- **範圍：** 僅偵測提示層級；不評估模型回應
-- **類別：** 限於六種預定義的風險類別；可能遺漏新型態的有害內容
-## 引用
-```bibtex
-@article{breezeguard,
-  title={Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin},
-  author={Hsu, Po-Chun and Chen, Meng-Hsi and Chao, Tsu Ling and Han, Chia Tien and Shiu, Da-shan},
-  year={2026},
-  institution={MediaTek Research}
-}
-```
-## 聯繫作者
-如有問題或建議，請聯繫：pochun.hsu@mtkresearch.com