Gemma 4 E2B IT q4f16_1 MLC
Clean validated MLC/WebLLM packaging of google/gemma-4-E2B-it in q4f16_1 for browser-local WebGPU and MLC-LLM runtimes.
This repository is built from a local mlc-llm / TVM fork and reflects the cleaned baseline validated on 2026-04-13.
Release note โ 2026-04-14
- Public HF artifact now matches the validated clean baseline.
- Bug 9b fix is included;
prefill_chunk_size=16workaround is no longer needed. - Debug instrumentation and experiment branches are removed from the active path.
- Public HF smoke test passes from the HF URL, including the canonical France/Paris case.
Status
- Text path: validated
- Quantization:
q4f16_1 - Runtime target:
webgpu - Model type:
gemma4 - Conversation template:
gemma_instruction - Prefill chunk size:
1024 - Debug instrumentation: removed
- Known chunk16 workaround: not required
Files
libs/gemma-4-E2B-it-q4f16_1-MLC-webgpu.wasm: validated WebGPU model librarymlc-chat-config.json: MLC runtime configurationparams_shard_*.bin: quantized parameter shardstensor-cache.json: tensor metadata cachetokenizer.json,tokenizer_config.json: tokenizer assetsrelease-manifest.json: file inventory with SHA-256 hashes
Usage
Chat
mlc_llm chat HF://welcoma/gemma-4-E2B-it-q4f16_1-MLC
WebLLM Integration
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const repo = "https://huggingface.co/welcoma/gemma-4-E2B-it-q4f16_1-MLC";
const appConfig = {
model_list: [
{
model: repo,
model_id: "gemma-4-E2B-it-q4f16_1-MLC",
model_lib: `${repo}/resolve/main/libs/gemma-4-E2B-it-q4f16_1-MLC-webgpu.wasm`,
required_features: ["shader-f16"],
},
],
};
const engine = await CreateMLCEngine("gemma-4-E2B-it-q4f16_1-MLC", {
appConfig,
});
Notes
This is a custom MLC/WebLLM artifact, not an official mlc-ai release. The validated scope is Gemma 4 E2B text generation on WebGPU.
Model tree for welcoma/gemma-4-E2B-it-q4f16_1-MLC
Base model
google/gemma-4-E2B-it