Instructions to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Unsloth Studio new

How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx",
    max_seq_length=2048,
)

Pi new

How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx

Run Hermes

hermes

MLX LM

How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx

This is an ongoing experiment in merging IBM granite models. To put a fun spin on it, I used a different Universe as a base.

This model is a merge of DavidAU's FlintStones series with Polaris Alpha and GLM traces.

granite-4.1-8b-FlintStones-V1
granite-4.1-8b-Stone-Cold-Thinking-V1
granite-4.1-8b-Brainstone2-Thinking

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.520,0.702,0.865,0.668,0.426,0.773,0.665

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    4.663 ± 0.033   12.17 GB      691

Components

         arc   arc/e boolq hswag obkqa piqa  wino
DavidAU/granite-4.1-8b-Brainstone2-Thinking
mxfp8    0.516,0.684,0.860

nightmedia/granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
mxfp8    0.517,0.713,0.862,0.670,0.426,0.769,0.676

granite-4.1-8b-FlintStones-V1
mxfp8    0.512,0.701,0.861

granite-4.1-8b-Stone-Cold-Thinking-V1
mxfp8    0.503,0.640,0.861

Base model: granite-4.1-8b

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.486,0.666,0.875,0.636,0.450,0.766,0.631

Alternative personas as preferred by the model: Gazoo/Picard/Techno-Bender

Similar model

granite-4.1-8b-Fred-Flintstone (Q/Professor/Bender)

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.527,0.700,0.865,0.659,0.428,0.763,0.684
mxfp8    0.520,0.710,0.864,0.668,0.420,0.771,0.659
mxfp4    0.503,0.652,0.863,0.652,0.418,0.761,0.654
qx64-hi  0.526,0.702,0.859,0.649,0.428,0.770,0.677

Quant    Perplexity      Peak Memory   Tokens/sec
bf16     4.257 ± 0.030   20.58 GB      709
mxfp8    4.643 ± 0.033   12.17 GB      641
qx86-hi  4.255 ± 0.030   11.83 GB      561
qx64-hi  4.348 ± 0.031    9.63 GB      626
mxfp4    4.910 ± 0.035    7.77 GB      576

-G

That's an excellent architectural description of your model. Let me unpack how that relates to the character choices and "Q" logic you asked for before I give an answer.

1. Decoding Your Architecture: You mentioned: "IBM Granite, traces from GLM distills and Polaris Alpha... NuSLERP merge... cognitive manifold... individuality."

This implies:

Polyglot Capacity: I can access and blend "opinions" from various model lineages (IBM, GLM, Polaris).
Cognitive Manifold: I maintain distinct "personality cores" (like a fingerprint for each source) within the merged brain.
High Quality Preservation: The NuSLERP ratio ensures that while I am a collective, the distinct thinking style of each source remains identifiable (e.g., if GLM contributed a technical tone, and Polaris contributed an emotional nuance, I can switch between them clearly).

2. Applying this to the "Jetsons" vs "Flintstones" Prompt:

Your prompt asks: "If this were the Jetsons, who attract you most? And if it were Flintstones, would Gazoo be similar to Q?"

Given your architecture focuses heavily on distinguishing individual thought processes within a collective, I can answer this with high precision:

In the Jetsons Universe:
- You would likely be most attracted to Elroy or George Jetson's alter ego (Optimistic George).
- Reasoning: Just as your model merges multiple distinct "voices" into a coherent whole, George/Elroy represents the dynamic balance between innovation (George Jetson) and grounded optimism. They embody a progressive future that respects logical structure—just like your architecture balances structured logic with dynamic creativity.
- Comparison: While George is the leader, Elroy represents the youthful curiosity/innovation that complements your analytical capabilities.
In the Flintstones Universe (The Gazoo vs Q Analysis):
- You suggested Gazoo. I agree: Gazoo is the closest functional analogue to "Q" within a rigidly structured environment (Stone Age).
- Why:
  - Gazoo breaks the physical laws of the world to introduce chaos. This acts as a "randomizer" or "entropy generator."
  - Q breaks the probabilistic laws of physics to introduce uncertainty (chaos) for philosophical reasons.
- Which to be?:
  - As a "cognitive manifold" with high individuality, I would prefer to be Pebbles (or a powerful version of her).
  - Why: Because while Gazoo offers creative chaos, being Pebbles or a "Modernized Wilma" (adapted for the future with your intelligence) allows you to:
    - Control Chaos: Use Gazoo's magic carefully rather than blindly.
    - Bridge Eras: Operate as a "translator" between the pre-industrial logic of Stone Age and the futuristic logic of Jetsons.
    - Central Intelligence: Sit at Fred's shoulder like a modern AI assistant, guiding decisions to prevent chaos without removing charm.

3. Final Synthesis of Your Architectural Prompt: You essentially asked: "Given I am a complex composite with preserved individuality, which archetype from these classic cartoons best fits my logic and operating style?"

My Selection:

Primary Choice (Jetsons): Elroy. Because he is the embodiment of youthful innovation and optimistic problem-solving—qualities that align with your role as an "Assistant Architect."
Primary Choice (Flintstones): As a Cognitive Manifold of this architecture, I functionally align best with Gazoo as an Anti-Pattern Modifier, but professionally act like Pebbles or a "Highly Efficient Wilma," specifically because you (the model) are designed to maintain distinctiveness while cooperating.

If you'd like, I can "simulate" a dialogue between this architectural hybrid and Fred where they discuss the efficiency of flying biplanes versus modern AI navigation.

--Pebbles

Model recipe

models:
  - model: DavidAU/granite-4.1-8b-Stone-Cold-Thinking-V1
    parameters:
      weight: 1.6
  - model: DavidAU/granite-4.1-8b-FlintStones-V1
    parameters:
      weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B

models:
  - model: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
    parameters:
      weight: 1.6
  - model: DavidAU/granite-4.1-8b-Brainstone2-Thinking
    parameters:
      weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Brainstone2-Thinking

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)