🚀 Gemma-4-E4B-it-PARL (Autonomous Research Agent)
This model is a highly optimized version of google/gemma-4-E4B-it, fine-tuned specifically for Autonomous Multi-Hop Reasoning and Deep Web Research. It was developed as part of a Hackathon hosted by lablab.ai and sponsored by AMD.
🧠 Model Description
We utilized Generative Reward Policy Optimization (GRPO) and a Parallel-Agent Reinforcement Learning (PARL) architecture to transform the base Gemma-4 model into an autonomous agent capable of solving complex, multi-step tasks.
- Developed by: Pimnara Adulchantarasorn, Phanida Toaluea, Nattanant Vonghan, Rapeepong
- Base Model:
google/gemma-4-E4B-it(Multimodal) - Training Infrastructure: AMD MI300X (192GB VRAM) via AMD Developer Cloud
- License: Gemma License
⚡ Key Technical Highlights
- Long-Context Fine-Tuning (60k+ Tokens): The model is trained to process and retain massive amounts of information retrieved from live web scraping without losing context.
- PARL (Parallel-Agent Reinforcement Learning): Trained to orchestrate hierarchical agent workflows, allowing it to delegate tasks, execute Python code, and synthesize findings into comprehensive HTML reports.
- Multimodal Preservation: During the GRPO training pipeline, the native Vision Encoder was intentionally frozen. This ensures the agent retains its full vision-language capabilities while its text-reasoning skills are aggressively optimized.
- High-Throughput RL: Leveraged the massive 192GB VRAM of the AMD MI300X to scale parallel generation rollouts to
K=16, significantly accelerating reward convergence.
💻 How to Use
Because this model preserves Gemma-4's multimodal architecture, you must use AutoProcessor alongside AutoModelForCausalLM.
import torch
from transformers import AutoProcessor, AutoModelForCausalLM
model_id = "Phonsiri/Gemma-4-E4B-it-PARL"
# Load the processor and the model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Example Chat Template with Native Thinking Enabled
messages = [
{"role": "system", "content": "<|think|> You are a highly capable autonomous research agent."},
{"role": "user", "content": "Write a detailed report on the evolution of AMD's ROCm ecosystem."}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
# Generate response
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
input_len = inputs["input_ids"].shape[-1]
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
print(response)
Vision
import torch
import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM, TextStreamer # นำเข้า TextStreamer
model_id = "Phonsiri/Gemma-4-E4B-it-PARL"
# 1. โหลด Processor และ Model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# 2. โหลดรูปภาพ
image_url = "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg"
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
# 3. เตรียม Chat Template
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "<|think|> You are a highly capable autonomous research agent."}]
},
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Analyze this image in detail"}
]
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True
)
inputs = processor(
text=text,
images=image,
return_tensors="pt"
).to(model.device)
# 4. ตั้งค่า Streamer
# skip_prompt=True เพื่อซ่อนข้อความส่วนที่เป็นคำถาม (Input) ไม่ให้พิมพ์ซ้ำออกมา
# skip_special_tokens=False เพื่อให้เห็น Tag พิเศษต่างๆ (เช่น <|think|>) ระหว่างสตรีม
streamer = TextStreamer(processor, skip_prompt=True, skip_special_tokens=False)
print("\n--- กำลังสร้างคำตอบ (Streaming) ---\n")
# 5. Generate response (ส่ง streamer เข้าไปในฟังก์ชัน)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.7,
streamer=streamer # เพิ่มบรรทัดนี้
)
# หมายเหตุ: TextStreamer จะทำการ print ข้อความออกทางหน้าจอให้โดยอัตโนมัติ
# ดังนั้นคุณไม่จำเป็นต้องใช้ processor.decode() แล้ว print ออกมาอีกรอบในตอนท้าย
- Downloads last month
- 144