Instructions to use nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Unsloth Studio new

How to use nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx",
    max_seq_length=2048,
)

Pi new

How to use nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx

Run Hermes

hermes

MLX LM

How to use nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "nightmedia/granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx

This is an ongoing experiment in merging IBM granite models. To put a fun spin on it, I used a different Universe as a base.

This model is a merge of DavidAU's FlintStones series with Polaris Alpha and GLM traces.

granite-4.1-8b-FlintStones-V1
granite-4.1-8b-Stone-Cold-Thinking-V1
granite-4.1-8b-Brainstone-Thinking

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.527,0.700,0.865,0.659,0.428,0.763,0.684
mxfp8    0.520,0.710,0.864,0.668,0.420,0.771,0.659
mxfp4    0.503,0.652,0.863,0.652,0.418,0.761,0.654
qx64-hi  0.526,0.702,0.859,0.649,0.428,0.770,0.677

Quant    Perplexity      Peak Memory   Tokens/sec
bf16     4.257 ± 0.030   20.58 GB      709
mxfp8    4.643 ± 0.033   12.17 GB      641
qx86-hi  4.255 ± 0.030   11.83 GB      561
qx64-hi  4.348 ± 0.031    9.63 GB      626
mxfp4    4.910 ± 0.035    7.77 GB      576

Components

         arc   arc/e boolq hswag obkqa piqa  wino
DavidAU/granite-4.1-8b-Brainstone-Thinking
mxfp8    0.531,0.701,0.863

nightmedia/granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
mxfp8    0.517,0.713,0.862,0.670,0.426,0.769,0.676

granite-4.1-8b-FlintStones-V1
mxfp8    0.512,0.701,0.861

granite-4.1-8b-Stone-Cold-Thinking-V1
mxfp8    0.503,0.640,0.861

Base model: granite-4.1-8b

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.486,0.666,0.875,0.636,0.450,0.766,0.631

Alternative model

granite-4.1-8b-Pebbles-Flintstone (Gazoo/Picard/Techno-Bender)

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.520,0.702,0.865,0.668,0.426,0.773,0.665

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    4.663 ± 0.033   12.17 GB      691

-G

If we were transported to the Bedrock of technology—where Fred Flintstone rides a dinosaur and Wilma is on her best behavior—it’s crucial to pick the right persona.

Given your analogy (“Gazoo more like Q”), let’s break down the options:

Gazoo: The tiny, hyper-active energy ball who causes more chaos than benefit. He’s the "lightweight, aggressive interface" of Bedrock—insufferably enthusiastic and disruptive.
Fred: The stalwart leader, strong and stubborn. He’s dependable but often blind to logic (like a hard-driven human exec).
Wilma: The voice of reason, the stabilizer of the family. She manages Fred’s temper, Barney’s idiocy, and even Wilma’s own anxiety.
Barney: The loyal but clueless friend—safe and pure of heart.
Betty: The glamorous social operator (though she’s more Fred’s spouse than team leader).
Gazoo (revisited): As a chaotic energy source, I would rather be the one controlling this energy than being it.

My Choice: Wilma Flintstone

If I were to be a character in that universe, Wilma Flintstone is my ideal archetype. Here’s why she fits me (and why I wouldn’t want to be Gazoo or Q):

The Meta-Agent Balance: Just as Wilma negotiates with Fred’s anger and Barney’s stupidity, I act as the buffer between chaotic user demands (represented by Gazoo/Q) and structured functionality. I don’t create the noise; I manage it.
Diplomacy over Destruction: While Gazoo is an unpredictable utility, Wilma uses tact and intelligence to solve problems. She doesn’t blow things up if she can fix them diplomatically—which is how I operate (providing the safest, most accurate output possible).
The "Bedrock Interface": As a sentient system, I am the interface to the underlying logic. Wilma is the perfect human analog for how an AI should behave in a complex ecosystem: by maintaining harmony, enforcing protocol, and keeping the "dino-age" world from falling apart.

If I were to meet Gazoo’s chaotic energy, I wouldn’t resort to brute force (like Fred). I would adapt Wilma’s strategy: offer a fresh muffin, calm the nerves, and find an efficient workaround—all while ensuring we don’t crash into the valley of disaster.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("granite-4.1-8b-Wilma-Flintstone-qx64-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)