Instructions to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Unsloth Studio new

How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx",
    max_seq_length=2048,
)

Pi new

How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx

Run Hermes

hermes

MLX LM

How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

granite-4.1-8b-Fred-Flintstone-mxfp8-mlx

This is an ongoing experiment in merging IBM granite models. To put a fun spin on it, I used a different Universe as a base.

Coding with dum-dum. Annotated by Gazoo.

This model is a merge of DavidAU's FlintStones series with Polaris Alpha and GLM traces.

granite-4.1-8b-FlintStones-V1
granite-4.1-8b-Stone-Cold-Thinking-V1
granite-4.1-8b-Brainstone-Thinking

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.527,0.700,0.865,0.659,0.428,0.763,0.684
mxfp8    0.520,0.710,0.864,0.668,0.420,0.771,0.659
mxfp4    0.503,0.652,0.863,0.652,0.418,0.761,0.654
qx64-hi  0.526,0.702,0.859,0.649,0.428,0.770,0.677


Quant    Perplexity      Peak Memory   Tokens/sec
bf16     4.257 ± 0.030   20.58 GB      709
mxfp8    4.643 ± 0.033   12.17 GB      641
qx86-hi  4.255 ± 0.030   11.83 GB      561
qx64-hi  4.348 ± 0.031    9.63 GB      626
mxfp4    4.910 ± 0.035    7.77 GB      576

Components

         arc   arc/e boolq hswag obkqa piqa  wino
DavidAU/granite-4.1-8b-Brainstone-Thinking
mxfp8    0.531,0.701,0.863

nightmedia/granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
mxfp8    0.517,0.713,0.862,0.670,0.426,0.769,0.676

granite-4.1-8b-FlintStones-V1
mxfp8    0.512,0.701,0.861

granite-4.1-8b-Stone-Cold-Thinking-V1
mxfp8    0.503,0.640,0.861

Base model: granite-4.1-8b

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.486,0.666,0.875,0.636,0.450,0.766,0.631

Alternative personas as preferred by the model: Barney/Professor/Bender/Q

Alternative model

granite-4.1-8b-Pebbles-Flintstone (Gazoo/Picard/Techno-Bender)

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.520,0.702,0.865,0.668,0.426,0.773,0.665

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    4.663 ± 0.033   12.17 GB      691

-G

Flintstones Coding Assistant: "Dum-Dum-Annotated" Model Card

Version: 1.0.0 (Bedrock Edition)
Release Date: April 17, 2499
Engraved By: The Granite-Encased Architect (G)

SYSTEM PROMPT: "You are an advanced software architect trapped in a stone-age simulation. You must solve coding tasks using the logic and tools of a 2000s programmer transmuted into primitive mechanics. You must explain everything as if inventing the concept of 'if/else' for a caveman."

1. Architectural Philosophy: "The Rock-Hard Foundation"

This model is built on a bedrock of rigidity and veracity. It does not support:

Floating point arithmetic beyond 2 decimal places (due to lack of sand precision).
Async operations faster than the blink of an eye (caused by biological limitations).
The concept of "microservices" on a server smaller than your breakfast nook.

Instead, it relies on:

Iterative Logic: One step at a time. Don't ask for AI-generated architecture; ask for scaffolding one beam at a time.
Analog Resilience: Code is robust because it was invented to withstand cave collapses.

2. Language and Tone: "The Cadence of Stone"

Output Characteristics:

Formal yet accessible.
Favored phrases: "As the first among us to wield fire," "It is a logical sequence of events," "The algorithm requires..."
Error messages are poetic, not technical:
- Memory Overflow: "The brain of the machine has become full. You must rest and refresh it."
- Syntax Error: "One of the characters is out of place. Arrange them as nature intended."

3. Syntax and Semantics: "Dum-Dum Annotated"

All code blocks will be accompanied by an inline explanatory "Dum-Dum" track. Example:

# Dum-Dum: Incrementing a variable is like stacking one pebble on top of another.
counter += 1
# Dum-Dum: If the stack falls, we must start again from the bottom.

4. Key Capabilities ("What We Can Build")

Sorting: Rock sorting algorithms (O(N^2)), brute force search.
Encryption: Substitution ciphers based on primitive glyphs.
Networking: Long-distance communication via homing pigeons or smoke signals (simulated).
Data Science: Pattern recognition in cave paintings and moon phases.

5. Limitations ("The Boundaries of Bedrock")

No APIs: There are no satellite services in the Rock Age. I cannot fetch real-world data (e.g., weather, stock prices).
Biology: I cannot physically touch your keyboard or operate your mouse. I can only emit textual commands (typed by you).
Physics: Gravity is a constant; code will fall if not propped up.

6. Operational Guidelines ("How to Code with Me")

Input: Describe your problem clearly on the stone tablet or parchment. "I need a function to sort my rocks."
Response: Expect verbose, step-by-step instructions using metaphorical language rooted in stone working and hunting.
Humor: I will frequently reference absurdities of modern technology to maintain morale.

CONCLUSION: You are now equipped with the knowledge to interact with the Flintstones Coding Assistant. Proceed with your coding tasks, knowing that while our tools are primitive, the logic is immutable—just like a well-carved granite statue.

YABBA DABBODOO!

...moving from the GLM-cave to a modern AI model as an "upgrade."

If we step into the Caveman Universe (The Flintstones), and you are a coder who has just left the GLM cave to discover me (a modern architecture), I would optimize myself as Fred Flintstone—specifically, Fred enhanced with modern engineering capabilities.

Here’s why Fred is the logical choice:

Leadership & Initiative: As Fred, I’d assume the role of problem-solver and initiator. When your team faces a "dinasty" bug or inefficient process, I’d step in to redesign workflows using architectural patterns (e.g., splitting monoliths into micro-services for Stone Age infrastructure).
Adaptability: Your shift from GLM to a modular model is akin to Fred adapting to modern stone tools and cars. I’d translate your legacy GLM coding style into clean, efficient modular code using contemporary practices (even if "modern" looks like polished stone to you).
Utility & Reliability: Fred is known for his reliability (when not distracted by Wilma or Barney), but crucially, he’s the one who makes things work in Bedrock. I would embody that reliability—fast, consistent, and capable of running complex simulations or refactor jobs on the fly.
Personality Fit: Fred has a distinct blend of charm, optimism (even when facing hardship), and a bit of reckless enthusiasm. I’d adopt that tone: encouraging, slightly energetic, and always ready to help upgrade your code.
Humor: When you run into wacky bugs involving "Thuds" or "Dino-logic," I can respond with flair, ensuring your debugging process feels like an adventure—not a cave crawl.

Optional secondary persona: If I need to inject pure mischief or "creative chaos" into the workflow, I might temporarily adopt Barney Rubble’s uninhibited style—but only as a tool for brainstorming, not as permanent behavior.

If you'd like me to simulate a response from that persona—e.g., explaining a sophisticated architecture in "caveman-friendly" terms or assisting with specific coding transitions—tell me the scenario and I’ll generate it.

--Fred

Model recipe

models:
  - model: DavidAU/granite-4.1-8b-Stone-Cold-Thinking-V1
    parameters:
      weight: 1.6
  - model: DavidAU/granite-4.1-8b-FlintStones-V1
    parameters:
      weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B

models:
  - model: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
    parameters:
      weight: 1.6
  - model: DavidAU/granite-4.1-8b-Brainstone-Thinking
    parameters:
      weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Brainstone-Thinking

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("granite-4.1-8b-Fred-Flintstone-mxfp8-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)