🚀 Gemma-4-E4B-it-PARL (Autonomous Research Agent)

This model is a highly optimized version of google/gemma-4-E4B-it, fine-tuned specifically for Autonomous Multi-Hop Reasoning and Deep Web Research. It was developed as part of a Hackathon hosted by lablab.ai and sponsored by AMD.

🧠 Model Description

We utilized Generative Reward Policy Optimization (GRPO) and a Parallel-Agent Reinforcement Learning (PARL) architecture to transform the base Gemma-4 model into an autonomous agent capable of solving complex, multi-step tasks.

Developed by: Pimnara Adulchantarasorn, Phanida Toaluea, Nattanant Vonghan, Rapeepong
Base Model: google/gemma-4-E4B-it (Multimodal)
Training Infrastructure: AMD MI300X (192GB VRAM) via AMD Developer Cloud
License: Gemma License

⚡ Key Technical Highlights

Long-Context Fine-Tuning (60k+ Tokens): The model is trained to process and retain massive amounts of information retrieved from live web scraping without losing context.
PARL (Parallel-Agent Reinforcement Learning): Trained to orchestrate hierarchical agent workflows, allowing it to delegate tasks, execute Python code, and synthesize findings into comprehensive HTML reports.
Multimodal Preservation: During the GRPO training pipeline, the native Vision Encoder was intentionally frozen. This ensures the agent retains its full vision-language capabilities while its text-reasoning skills are aggressively optimized.
High-Throughput RL: Leveraged the massive 192GB VRAM of the AMD MI300X to scale parallel generation rollouts to K=16, significantly accelerating reward convergence.

💻 How to Use

Because this model preserves Gemma-4's multimodal architecture, you must use AutoProcessor alongside AutoModelForCausalLM.

import torch
from transformers import AutoProcessor, AutoModelForCausalLM

model_id = "Phonsiri/Gemma-4-E4B-it-PARL"

# Load the processor and the model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Example Chat Template with Native Thinking Enabled
messages = [
    {"role": "system", "content": "<|think|> You are a highly capable autonomous research agent."},
    {"role": "user", "content": "Write a detailed report on the evolution of AMD's ROCm ecosystem."}
]

text = processor.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True, 
    enable_thinking=True
)

inputs = processor(text=text, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)

input_len = inputs["input_ids"].shape[-1]
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

print(response)

Vision

import torch
import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM, TextStreamer # นำเข้า TextStreamer

model_id = "Phonsiri/Gemma-4-E4B-it-PARL"

# 1. โหลด Processor และ Model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# 2. โหลดรูปภาพ
image_url = "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg"
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")

# 3. เตรียม Chat Template
messages = [
    {
        "role": "system", 
        "content": [{"type": "text", "text": "<|think|> You are a highly capable autonomous research agent."}]
    },
    {
        "role": "user", 
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Analyze this image in detail"}
        ]
    }
]

text = processor.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True,
    enable_thinking=True 
)

inputs = processor(
    text=text,
    images=image,
    return_tensors="pt"
).to(model.device)

# 4. ตั้งค่า Streamer
# skip_prompt=True เพื่อซ่อนข้อความส่วนที่เป็นคำถาม (Input) ไม่ให้พิมพ์ซ้ำออกมา
# skip_special_tokens=False เพื่อให้เห็น Tag พิเศษต่างๆ (เช่น <|think|>) ระหว่างสตรีม
streamer = TextStreamer(processor, skip_prompt=True, skip_special_tokens=False)

print("\n--- กำลังสร้างคำตอบ (Streaming) ---\n")

# 5. Generate response (ส่ง streamer เข้าไปในฟังก์ชัน)
with torch.no_grad():
    outputs = model.generate(
        **inputs, 
        max_new_tokens=4096, 
        temperature=0.7,
        streamer=streamer # เพิ่มบรรทัดนี้
    )

# หมายเหตุ: TextStreamer จะทำการ print ข้อความออกทางหน้าจอให้โดยอัตโนมัติ 
# ดังนั้นคุณไม่จำเป็นต้องใช้ processor.decode() แล้ว print ออกมาอีกรอบในตอนท้าย

Downloads last month: 144

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Phonsiri/Gemma-4-E4B-it-PARL

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

(214)

this model