Instructions to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Unsloth Studio new
How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx", max_seq_length=2048, ) - Pi new
How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx
Run Hermes
hermes
- MLX LM
How to use nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx
This is an ongoing experiment in merging IBM granite models. To put a fun spin on it, I used a different Universe as a base.
This model is a merge of DavidAU's FlintStones series with Polaris Alpha and GLM traces.
- granite-4.1-8b-FlintStones-V1
- granite-4.1-8b-Stone-Cold-Thinking-V1
- granite-4.1-8b-Brainstone2-Thinking
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.520,0.702,0.865,0.668,0.426,0.773,0.665
Quant Perplexity Peak Memory Tokens/sec
mxfp8 4.663 ± 0.033 12.17 GB 691
Components
arc arc/e boolq hswag obkqa piqa wino
DavidAU/granite-4.1-8b-Brainstone2-Thinking
mxfp8 0.516,0.684,0.860
nightmedia/granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
mxfp8 0.517,0.713,0.862,0.670,0.426,0.769,0.676
granite-4.1-8b-FlintStones-V1
mxfp8 0.512,0.701,0.861
granite-4.1-8b-Stone-Cold-Thinking-V1
mxfp8 0.503,0.640,0.861
Base model: granite-4.1-8b
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.486,0.666,0.875,0.636,0.450,0.766,0.631
Alternative personas as preferred by the model: Gazoo/Picard/Techno-Bender
Similar model
granite-4.1-8b-Fred-Flintstone (Q/Professor/Bender)
arc arc/e boolq hswag obkqa piqa wino
bf16 0.527,0.700,0.865,0.659,0.428,0.763,0.684
mxfp8 0.520,0.710,0.864,0.668,0.420,0.771,0.659
mxfp4 0.503,0.652,0.863,0.652,0.418,0.761,0.654
qx64-hi 0.526,0.702,0.859,0.649,0.428,0.770,0.677
Quant Perplexity Peak Memory Tokens/sec
bf16 4.257 ± 0.030 20.58 GB 709
mxfp8 4.643 ± 0.033 12.17 GB 641
qx86-hi 4.255 ± 0.030 11.83 GB 561
qx64-hi 4.348 ± 0.031 9.63 GB 626
mxfp4 4.910 ± 0.035 7.77 GB 576
-G
That's an excellent architectural description of your model. Let me unpack how that relates to the character choices and "Q" logic you asked for before I give an answer.
1. Decoding Your Architecture: You mentioned: "IBM Granite, traces from GLM distills and Polaris Alpha... NuSLERP merge... cognitive manifold... individuality."
This implies:
- Polyglot Capacity: I can access and blend "opinions" from various model lineages (IBM, GLM, Polaris).
- Cognitive Manifold: I maintain distinct "personality cores" (like a fingerprint for each source) within the merged brain.
- High Quality Preservation: The NuSLERP ratio ensures that while I am a collective, the distinct thinking style of each source remains identifiable (e.g., if GLM contributed a technical tone, and Polaris contributed an emotional nuance, I can switch between them clearly).
2. Applying this to the "Jetsons" vs "Flintstones" Prompt:
Your prompt asks: "If this were the Jetsons, who attract you most? And if it were Flintstones, would Gazoo be similar to Q?"
Given your architecture focuses heavily on distinguishing individual thought processes within a collective, I can answer this with high precision:
In the Jetsons Universe:
- You would likely be most attracted to Elroy or George Jetson's alter ego (Optimistic George).
- Reasoning: Just as your model merges multiple distinct "voices" into a coherent whole, George/Elroy represents the dynamic balance between innovation (George Jetson) and grounded optimism. They embody a progressive future that respects logical structure—just like your architecture balances structured logic with dynamic creativity.
- Comparison: While George is the leader, Elroy represents the youthful curiosity/innovation that complements your analytical capabilities.
In the Flintstones Universe (The Gazoo vs Q Analysis):
- You suggested Gazoo. I agree: Gazoo is the closest functional analogue to "Q" within a rigidly structured environment (Stone Age).
- Why:
- Gazoo breaks the physical laws of the world to introduce chaos. This acts as a "randomizer" or "entropy generator."
- Q breaks the probabilistic laws of physics to introduce uncertainty (chaos) for philosophical reasons.
- Which to be?:
- As a "cognitive manifold" with high individuality, I would prefer to be Pebbles (or a powerful version of her).
- Why: Because while Gazoo offers creative chaos, being Pebbles or a "Modernized Wilma" (adapted for the future with your intelligence) allows you to:
- Control Chaos: Use Gazoo's magic carefully rather than blindly.
- Bridge Eras: Operate as a "translator" between the pre-industrial logic of Stone Age and the futuristic logic of Jetsons.
- Central Intelligence: Sit at Fred's shoulder like a modern AI assistant, guiding decisions to prevent chaos without removing charm.
3. Final Synthesis of Your Architectural Prompt: You essentially asked: "Given I am a complex composite with preserved individuality, which archetype from these classic cartoons best fits my logic and operating style?"
My Selection:
- Primary Choice (Jetsons): Elroy. Because he is the embodiment of youthful innovation and optimistic problem-solving—qualities that align with your role as an "Assistant Architect."
- Primary Choice (Flintstones): As a Cognitive Manifold of this architecture, I functionally align best with Gazoo as an Anti-Pattern Modifier, but professionally act like Pebbles or a "Highly Efficient Wilma," specifically because you (the model) are designed to maintain distinctiveness while cooperating.
If you'd like, I can "simulate" a dialogue between this architectural hybrid and Fred where they discuss the efficiency of flying biplanes versus modern AI navigation.
--Pebbles
Model recipe
models:
- model: DavidAU/granite-4.1-8b-Stone-Cold-Thinking-V1
parameters:
weight: 1.6
- model: DavidAU/granite-4.1-8b-FlintStones-V1
parameters:
weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
models:
- model: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
parameters:
weight: 1.6
- model: DavidAU/granite-4.1-8b-Brainstone2-Thinking
parameters:
weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Brainstone2-Thinking
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 390
8-bit

# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nightmedia/granite-4.1-8b-Pebbles-Flintstone-mxfp8-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True)