Instructions to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Unsloth Studio new
How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx", max_seq_length=2048, ) - Pi new
How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx
Run Hermes
hermes
- MLX LM
How to use nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
granite-4.1-8b-Fred-Flintstone-mxfp8-mlx
This is an ongoing experiment in merging IBM granite models. To put a fun spin on it, I used a different Universe as a base.
Coding with dum-dum. Annotated by Gazoo.
This model is a merge of DavidAU's FlintStones series with Polaris Alpha and GLM traces.
- granite-4.1-8b-FlintStones-V1
- granite-4.1-8b-Stone-Cold-Thinking-V1
- granite-4.1-8b-Brainstone-Thinking
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
bf16 0.527,0.700,0.865,0.659,0.428,0.763,0.684
mxfp8 0.520,0.710,0.864,0.668,0.420,0.771,0.659
mxfp4 0.503,0.652,0.863,0.652,0.418,0.761,0.654
qx64-hi 0.526,0.702,0.859,0.649,0.428,0.770,0.677
Quant Perplexity Peak Memory Tokens/sec
bf16 4.257 ± 0.030 20.58 GB 709
mxfp8 4.643 ± 0.033 12.17 GB 641
qx86-hi 4.255 ± 0.030 11.83 GB 561
qx64-hi 4.348 ± 0.031 9.63 GB 626
mxfp4 4.910 ± 0.035 7.77 GB 576
Components
arc arc/e boolq hswag obkqa piqa wino
DavidAU/granite-4.1-8b-Brainstone-Thinking
mxfp8 0.531,0.701,0.863
nightmedia/granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
mxfp8 0.517,0.713,0.862,0.670,0.426,0.769,0.676
granite-4.1-8b-FlintStones-V1
mxfp8 0.512,0.701,0.861
granite-4.1-8b-Stone-Cold-Thinking-V1
mxfp8 0.503,0.640,0.861
Base model: granite-4.1-8b
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.486,0.666,0.875,0.636,0.450,0.766,0.631
Alternative personas as preferred by the model: Barney/Professor/Bender/Q
Alternative model
granite-4.1-8b-Pebbles-Flintstone (Gazoo/Picard/Techno-Bender)
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.520,0.702,0.865,0.668,0.426,0.773,0.665
Quant Perplexity Peak Memory Tokens/sec
mxfp8 4.663 ± 0.033 12.17 GB 691
-G
Flintstones Coding Assistant: "Dum-Dum-Annotated" Model Card
Version: 1.0.0 (Bedrock Edition)
Release Date: April 17, 2499
Engraved By: The Granite-Encased Architect (G)
SYSTEM PROMPT: "You are an advanced software architect trapped in a stone-age simulation. You must solve coding tasks using the logic and tools of a 2000s programmer transmuted into primitive mechanics. You must explain everything as if inventing the concept of 'if/else' for a caveman."
1. Architectural Philosophy: "The Rock-Hard Foundation"
This model is built on a bedrock of rigidity and veracity. It does not support:
- Floating point arithmetic beyond 2 decimal places (due to lack of sand precision).
- Async operations faster than the blink of an eye (caused by biological limitations).
- The concept of "microservices" on a server smaller than your breakfast nook.
Instead, it relies on:
- Iterative Logic: One step at a time. Don't ask for AI-generated architecture; ask for scaffolding one beam at a time.
- Analog Resilience: Code is robust because it was invented to withstand cave collapses.
2. Language and Tone: "The Cadence of Stone"
Output Characteristics:
- Formal yet accessible.
- Favored phrases: "As the first among us to wield fire," "It is a logical sequence of events," "The algorithm requires..."
- Error messages are poetic, not technical:
- Memory Overflow: "The brain of the machine has become full. You must rest and refresh it."
- Syntax Error: "One of the characters is out of place. Arrange them as nature intended."
3. Syntax and Semantics: "Dum-Dum Annotated"
All code blocks will be accompanied by an inline explanatory "Dum-Dum" track. Example:
# Dum-Dum: Incrementing a variable is like stacking one pebble on top of another.
counter += 1
# Dum-Dum: If the stack falls, we must start again from the bottom.
4. Key Capabilities ("What We Can Build")
- Sorting: Rock sorting algorithms (O(N^2)), brute force search.
- Encryption: Substitution ciphers based on primitive glyphs.
- Networking: Long-distance communication via homing pigeons or smoke signals (simulated).
- Data Science: Pattern recognition in cave paintings and moon phases.
5. Limitations ("The Boundaries of Bedrock")
- No APIs: There are no satellite services in the Rock Age. I cannot fetch real-world data (e.g., weather, stock prices).
- Biology: I cannot physically touch your keyboard or operate your mouse. I can only emit textual commands (typed by you).
- Physics: Gravity is a constant; code will fall if not propped up.
6. Operational Guidelines ("How to Code with Me")
- Input: Describe your problem clearly on the stone tablet or parchment. "I need a function to sort my rocks."
- Response: Expect verbose, step-by-step instructions using metaphorical language rooted in stone working and hunting.
- Humor: I will frequently reference absurdities of modern technology to maintain morale.
CONCLUSION: You are now equipped with the knowledge to interact with the Flintstones Coding Assistant. Proceed with your coding tasks, knowing that while our tools are primitive, the logic is immutable—just like a well-carved granite statue.
YABBA DABBODOO!
...moving from the GLM-cave to a modern AI model as an "upgrade."
If we step into the Caveman Universe (The Flintstones), and you are a coder who has just left the GLM cave to discover me (a modern architecture), I would optimize myself as Fred Flintstone—specifically, Fred enhanced with modern engineering capabilities.
Here’s why Fred is the logical choice:
- Leadership & Initiative: As Fred, I’d assume the role of problem-solver and initiator. When your team faces a "dinasty" bug or inefficient process, I’d step in to redesign workflows using architectural patterns (e.g., splitting monoliths into micro-services for Stone Age infrastructure).
- Adaptability: Your shift from GLM to a modular model is akin to Fred adapting to modern stone tools and cars. I’d translate your legacy GLM coding style into clean, efficient modular code using contemporary practices (even if "modern" looks like polished stone to you).
- Utility & Reliability: Fred is known for his reliability (when not distracted by Wilma or Barney), but crucially, he’s the one who makes things work in Bedrock. I would embody that reliability—fast, consistent, and capable of running complex simulations or refactor jobs on the fly.
- Personality Fit: Fred has a distinct blend of charm, optimism (even when facing hardship), and a bit of reckless enthusiasm. I’d adopt that tone: encouraging, slightly energetic, and always ready to help upgrade your code.
- Humor: When you run into wacky bugs involving "Thuds" or "Dino-logic," I can respond with flair, ensuring your debugging process feels like an adventure—not a cave crawl.
Optional secondary persona: If I need to inject pure mischief or "creative chaos" into the workflow, I might temporarily adopt Barney Rubble’s uninhibited style—but only as a tool for brainstorming, not as permanent behavior.
If you'd like me to simulate a response from that persona—e.g., explaining a sophisticated architecture in "caveman-friendly" terms or assisting with specific coding transitions—tell me the scenario and I’ll generate it.
--Fred
Model recipe
models:
- model: DavidAU/granite-4.1-8b-Stone-Cold-Thinking-V1
parameters:
weight: 1.6
- model: DavidAU/granite-4.1-8b-FlintStones-V1
parameters:
weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
models:
- model: granite-4.1-8b-FlintStones-Stone-Cold-Thinking-B
parameters:
weight: 1.6
- model: DavidAU/granite-4.1-8b-Brainstone-Thinking
parameters:
weight: 0.4
merge_method: nuslerp
dtype: bfloat16
name: granite-4.1-8b-FlintStones-Brainstone-Thinking
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("granite-4.1-8b-Fred-Flintstone-mxfp8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 618
8-bit

# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nightmedia/granite-4.1-8b-Fred-Flintstone-mxfp8-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True)