Gemma 4 E2B IT q4f16_1 MLC

Clean validated MLC/WebLLM packaging of google/gemma-4-E2B-it in q4f16_1 for browser-local WebGPU and MLC-LLM runtimes.

This repository is built from a local mlc-llm / TVM fork and reflects the cleaned baseline validated on 2026-04-13.

Release note โ€” 2026-04-14

  • Public HF artifact now matches the validated clean baseline.
  • Bug 9b fix is included; prefill_chunk_size=16 workaround is no longer needed.
  • Debug instrumentation and experiment branches are removed from the active path.
  • Public HF smoke test passes from the HF URL, including the canonical France/Paris case.

Status

  • Text path: validated
  • Quantization: q4f16_1
  • Runtime target: webgpu
  • Model type: gemma4
  • Conversation template: gemma_instruction
  • Prefill chunk size: 1024
  • Debug instrumentation: removed
  • Known chunk16 workaround: not required

Files

  • libs/gemma-4-E2B-it-q4f16_1-MLC-webgpu.wasm: validated WebGPU model library
  • mlc-chat-config.json: MLC runtime configuration
  • params_shard_*.bin: quantized parameter shards
  • tensor-cache.json: tensor metadata cache
  • tokenizer.json, tokenizer_config.json: tokenizer assets
  • release-manifest.json: file inventory with SHA-256 hashes

Usage

Chat

mlc_llm chat HF://welcoma/gemma-4-E2B-it-q4f16_1-MLC

WebLLM Integration

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const repo = "https://huggingface.co/welcoma/gemma-4-E2B-it-q4f16_1-MLC";

const appConfig = {
  model_list: [
    {
      model: repo,
      model_id: "gemma-4-E2B-it-q4f16_1-MLC",
      model_lib: `${repo}/resolve/main/libs/gemma-4-E2B-it-q4f16_1-MLC-webgpu.wasm`,
      required_features: ["shader-f16"],
    },
  ],
};

const engine = await CreateMLCEngine("gemma-4-E2B-it-q4f16_1-MLC", {
  appConfig,
});

Notes

This is a custom MLC/WebLLM artifact, not an official mlc-ai release. The validated scope is Gemma 4 E2B text generation on WebGPU.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for welcoma/gemma-4-E2B-it-q4f16_1-MLC

Quantized
(120)
this model