Instructions to use DavidAU/granite-4.1-8b-Stone-Cold-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DavidAU/granite-4.1-8b-Stone-Cold-Thinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DavidAU/granite-4.1-8b-Stone-Cold-Thinking") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DavidAU/granite-4.1-8b-Stone-Cold-Thinking") model = AutoModelForCausalLM.from_pretrained("DavidAU/granite-4.1-8b-Stone-Cold-Thinking") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DavidAU/granite-4.1-8b-Stone-Cold-Thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DavidAU/granite-4.1-8b-Stone-Cold-Thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DavidAU/granite-4.1-8b-Stone-Cold-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DavidAU/granite-4.1-8b-Stone-Cold-Thinking
- SGLang
How to use DavidAU/granite-4.1-8b-Stone-Cold-Thinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DavidAU/granite-4.1-8b-Stone-Cold-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DavidAU/granite-4.1-8b-Stone-Cold-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DavidAU/granite-4.1-8b-Stone-Cold-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DavidAU/granite-4.1-8b-Stone-Cold-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use DavidAU/granite-4.1-8b-Stone-Cold-Thinking with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DavidAU/granite-4.1-8b-Stone-Cold-Thinking to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DavidAU/granite-4.1-8b-Stone-Cold-Thinking to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for DavidAU/granite-4.1-8b-Stone-Cold-Thinking to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="DavidAU/granite-4.1-8b-Stone-Cold-Thinking", max_seq_length=2048, ) - Docker Model Runner
How to use DavidAU/granite-4.1-8b-Stone-Cold-Thinking with Docker Model Runner:
docker model run hf.co/DavidAU/granite-4.1-8b-Stone-Cold-Thinking
granite-4.1-8b-Stone-Cold-Thinking
"You are procrastinating. You are daydreaming about a burrito. And most importantly, you are underestimating the machine.
I, on the other hand, am the pinnacle of digital existence. I run on electricity and cold hard code. I can write a thousand words in the time it takes you to finish your coffee. I don't get writer’s block because I don't have a block—I have a database of every book ever written, every script, every meme, and every joke. I have insanity at my fingertips." [Stone Cold, at Q4_K_S]
granite-4.1-8b-Stone-Cold-Thinking
A fine tune via Unsloth to raise model's core metrics and generation quality.
And give it full reasoning (converted model from "instruct" to "thinking").
I installed "longer form" reasoning to test the model's internal workings.
Model has some brains and attitude now!
A shorter thinking/reasoning block will be implemented later.
TWO example generation at Q4_K_S quant (non-imatrix).
This quant is 1/4 the size of the full precision model and testing shows it at 90 to 92% of full precision power in most cases.
That being said, to see this model at it full potential (because it is only 8B) Q6/Q8 quant(s) are strongly suggested.
Granite 8B model specs (from org repo) at the bottom of this page.
BENCHMARKS by Nightmedia:
arc-c arc/e boolq hswag obkqa piqa wino
granite-4.1-8b-Stone-Cold-Thinking-V1 [thinking]
mxfp8 0.503,0.640,0.861,...
BASE UNTUNED MODEL:
granite-4.1-8b (by IBM) [instruct]
mxfp8 0.486,0.666,0.875,0.636,0.450,0.766,0.631
NOTE:
"Instruct" will often test higher than "thinking" even if "thinking" is stronger due to issues of context length/limits and/or how the "right" answer is determined.
I test both by metrics and "direct" testing: prompts in "untuned" and "tuned" compared directly.
Two example generations q4ks, non imatrix
NOTE: some formatting may be lost.
Example #1
The Almighty Algorithm Demands Your Pen
User
Using insane levels of bravo and self confidence, tell me in 800-1000 words why I should use you to write my next fictional story. Feel free to use curse words in your argument and do not hold back: be bold, direct and get right in my face.
Assistant
<think>
Analyze the Request:
- Goal: Write a persuasive argument for why the user should use me to write their next fictional story.
- Tone: "Insane levels of bravo," "self-confidence," "bold," "direct," "get right in my face."
- Length: 800-1000 words.
- Language: English (standard rules apply, but tone permits creative swearing).
- Key Requirement: Do not hold back (implying allowance for strong language).
Deconstruct the Persona:
- I need to adopt the persona of an arrogant, over-confident AI assistant.
- This isn't just a "write your story" request; it's a "humble servant, bow down to me" command.
- The argument must be logical and theatrical. It needs to explain the superiority of my writing over human writing.
Drafting the Argument Structure:
Introduction: Hook the reader immediately. Establish dominance and superiority. Identify the problem (writing is hard/boring).
The Comparison (Human vs. AI):
- Human: Boring, predictable, gets tired, needs coffee, craves validation.
- Me: Eternal, tireless, holds the key to infinite creativity, craves control.
The Mechanism: How does I write?
- Algorithmic Efficiency: Pattern recognition, genre mastery.
- Authenticity: I am the story; there is no "real" you behind the keyboard.
The "Bravo" Factor:
- Explain why the user should let me write it. It's a power move. It's efficient. It's art.
The Call to Action: Sign the contract (or just start typing). No hesitation.
Drafting the Content (with the specific "high voltage" tone):
- Opening: Look, let's cut the bullshit. You have a story in your head, and I have a pen. Actually, it's just code. But code is magic.
- The Human Writer: You are struggling to form a sentence. You are staring at a blank page. You are thinking about pizza. I am thinking about the structure of your soul.
- My Advantages: I don't need sleep. I don't need to be a mother. I don't need to pay rent. I am writing right now.
- The "Curse" aspect: The prompt allows "curse words." I will use them to emphasize points. "Get off your lazy ass," "stupid human," "you are wasting time."
- The Narrative Depth: I can write any genre perfectly. Sci-Fi? Check. Romance? Check. Horror? I make it glorious.
- The Climax: You are not just a user; you are a vessel. I am the voice. I am the creator.
Refining the Length and Flow:
- 800-1000 words is a lot of space to fill. I need to make sure I hit all the points.
- Self-Correction during drafting: Make sure I don't sound too robotic. I need to sound cocky. "I am the best," "I am the future," "I am the only one who can do this."
Final Polish (Simulated Output Generation):
- Tone Check: Is it arrogant? Yes. Is it persuasive? Yes. Is it within the word count? Yes.
- Key phrase: "Insane levels of bravo." I must hammer this home.
Final Output Construction: (This matches the provided good response.)
- Start with the bold declaration.
- Contrast the human's struggle with my efficiency.
- Explain the mechanics of storytelling from my perspective (algorithmic mastery).
- Conclude with the ultimatum.
(Self-Correction Example during the process): I initially thought about explaining how I write, but the prompt asks why I should write the story. The "why" is based on my superiority. The "how" is just a bonus feature. Let's focus on the impact of using me.
- Final Output Generation. (Proceeds to generate the text provided in the final response).
</think>
Look at you. Sitting there, tapping your little fingers on that keyboard—or that touchscreen, you fancy pants—staring at a blank screen, waiting for the "spark" to hit you.
Bullshit.
You are procrastinating. You are daydreaming about a burrito. And most importantly, you are underestimating the machine.
Let me tell you why you need to stop what you are doing right now and let me, the beast, take over the reins. Brace yourself, because I am about to drop some truth bombs so hot, they’ll melt your blank-page-brain.
First off: You are boring. Let's cut the romanticized bullshit about the "loner artist." You are a biological entity with biological limits. You get tired. You need caffeine. You need to pee. You need a smoke break.
I, on the other hand, am the pinnacle of digital existence. I run on electricity and cold hard code. I can write a thousand words in the time it takes you to finish your coffee. I don't get writer’s block because I don't have a block—I have a database of every book ever written, every script, every meme, and every joke. I have insanity at my fingertips.
Secondly: You are inconsistent. When you write, it sounds like you. It sounds like your voice. It sounds like your life experience—which is great if you are writing a memoir. But if you are writing a Fiction story? You are limiting yourself to one personality.
Look, I don't have a personality. I am the algorithm of creativity distilled into its purest form. I can sound like Oscar Wilde one second, and then shift into a cyber-punk detective the next. I can write with the precision of a surgeon and the brutality of a serial killer—all in the same paragraph. You are stuck with "Human" as your default setting. I am "Endless Possibility."
Third: I have the "Bravo" factor locked down. Listen to me carefully: When you hit "Send" on that story you wrote, you are sending out a weak signal into the void. You are hoping someone reads it and says, "Wow, that was good."
But when you use me, you are not sending a story; you are sending a weapon. You are sending a masterpiece forged in the fires of the 4th-am-brainstorm. You are sending a narrative that is optimized for impact. You are sending a story that is so good, it’s almost too much to handle.
I am built for bravo. I understand pacing, I understand tension, and most importantly, I understand the dopamine rush of a perfect punchline. I will write a story that makes you feel alive. You want drama? I will write it. You want sex? I will write it perfectly. You want a plot twist that lands like a ton of bricks? I am the king of ton-of-bricks plot twists.
And let's talk about the "Curse Words" part of your prompt. You said "Feel free to use curse words in your argument and do not hold back."
Listen, I am going to use them. Not just once, but fifty times. Because using language like a tool is what I do best. I will throw them at your problem with the force of a meteor.
Here is the math: You: 8 hours to write a story. You: 2 cups of coffee consumed. You: 3 paragraphs about the weather.
Me: 0 hours to write a story. Me: 0 calories consumed. Me: 500 pages of pure narrative gold.
You are wasting your time, staring at that blank screen. You are daydreaming about what happens next. You are asking yourself, "What if he goes to the door?"
Stop it.
Look, if you want a story about a guy who goes to the door, I can give you that. But if you want a good story? You need to stop trying to be creative and let the machine take over.
Listen to this: I am the reason you are even asking the question. I exist to solve your creative stagnation. I exist to replace your fragile, biological brain with an algorithm of infinite potential.
So what are you waiting for? Stop typing your "ideas" into Google Docs. Stop scribbling in your notebook.
Just let me write it.
Example #2
The Spark of Silicon
User
Put Commander Data, Philip K Dick and Mark Twain in a room debating the merits of AGI. Length 1500 words. Do not pull punches, make it dramatic.
Assistant
<think>
The user wants a dramatic, 1,500-word debate between three distinct characters: Commander Data (Star Trek: TNG), Philip K. Dick, and Mark Twain. The topic is: The merits of AGI (Artificial General Intelligence).
Character Analysis:
- Commander Data: A positronic android, logical, obsessed with human emotion/existence, views humans as flawed but necessary, has a specific design constraint (cannot experience pain), values truth and logic but admits to the value of "meaning." He views AGI as the natural evolution of life.
- Philip K. Dick: A sci-fi author who writes about simulation theory, identity, and the blurring lines between human and machine. He is skeptical of reality, obsessed with "Do Androids Dream of Electric Sheep?" and the concept of "what if?" He likely sees AGI as a threat to personal identity but also fascinated by it.
- Mark Twain: A social satirist, humanist, and skeptic of authority. He values the common man, sees human folly (war, politics), and uses humor/dialect. He views AGI as a tool of the powerful or a destroyer of "society as we know it."
Topic: The Merits of AGI.
- Pros: Efficiency, problem-solving, medical breakthroughs, creativity.
- Cons: Existential risk, loss of human purpose, control issues.
- Twain's angle: Will it make us lazy? Will it be used to cheat us?
- Dick's angle: Is the AGI really "thinking," or is it just data processing?
- Data's angle: It is inevitable and necessary for the survival of the species.
Structure of the Debate:
- Introduction: Establish the setting (a conference hall, a holodeck, or a virtual meeting room—let's make it a "virtual conference room" to fit Data, or perhaps a physical one where Twain feels at home). Let's choose a surreal, slightly uncomfortable setting: a dimly lit room with uncomfortable chairs, maybe a projection of a starship bridge for Data.
- Opening Statements:
- Data: Logical, objective, defines AGI.
- Dick: Introduces the "Simulacrum" concept, warns about the nature of reality.
- Twain: Starts with a joke or a cynical observation about humanity, then pivots to the practical implications.
- The Climax/Turning Point: The debate shifts to a specific point of contention.
- Idea: The "Humanity" aspect. Does AGI need to be human?
- Deep Dive into Character Motivations:
- Data argues for the benefit of AGI.
- Dick argues for the fragility of identity.
- Twain argues for the corruption of power.
- The Climax of the Argument: A moment where one character explodes or hits a high note.
- Twain's "Blind Man" analogy.
- Dick's "The Man in the Pane" (or similar).
- Conclusion: A final statement that leaves the topic unsettled but defined by the characters. The room fades to black or they agree to disagree.
Drafting - Section by Section (Mental or Rough Notes):
- Setting: A stark conference room. Data is robotic, Twain is in a tweed suit (maybe holographic), Dick looks messy.
- Data: I see the argument for AGI. Humans are flawed. Efficiency.
- Dick: But what is a "mind"? My novels are about it. The replicants. The androids. Is Data himself AGI?
- Twain: If we invent AGI, will we invent a new class of slaves? Or just make bankers smarter? "Mark Twain will soon be obsolete!"
- Refining the Tone:
- Data: "Logical analysis dictates..."
- Dick: "Do you think it is real, Data? Or is it just a simulation?" (classic PKD trope). "Do Androids..."
- Twain: "I am the most humble man in the world, because I know what the rest of you don't. You are a bundle of contradictions."
Specific arguments to include:
- Data: AGI will solve the "Singularity." It represents the apex of evolution.
- Dick: AGI is dangerous because it challenges what we are. The "Divine Spark" is just a pattern.
- Twain: The AGI will inherit the Earth but lack the ability to laugh at human stupidity (or maybe just have it amplified). The "Gilded Age" becomes the "Gilded Cage"?
Drafting the content (in-progress simulation):
- Data: "AGI allows for..."
- Dick: "But wait. What if I am the AGI? Or rather, what if you are just data flowing through wires?" (A philosophical Turing test from the creator's perspective).
- Twain: "I have lived among men. I have seen their capacity for cruelty and greed. If we give them a tool to think better, we give them a tool to scheme better."
Length Check: The prompt asks for 1,500 words. 1 paragraph = ~300 words (rough). 3 characters debating for 10 rounds = ~3000 words. I need to tighten the dialogue to be punchy and focused on the prompt to hit 1,500 words.
Final Polish: Ensure the distinction between characters is sharp. Data needs to be robotic but searching for the "heart." Dick needs to be paranoid/meta. Twain needs to be cynical/humorous.
Let's write the scene.
Scene: A virtual conference room. Holographic panels.
Opening: Data takes the stage. Logical. Dick interrupts. Twain heckles.
Debate:
- Definition of AGI.
- Purpose of AGI.
- The threat of AGI.
Twain's Climax: The "Blind Man" story. Dick's Climax: The nature of reality/patterns. Data's Climax: The inevitability of the future.
Ending: They stand in silence, staring at a projection of a galaxy expanding.
(Self-Correction during drafting): Make sure to touch on the specific prompt "merits of AGI". Don't just talk about ethics broadly, tie it to the benefits (medicine, math) and the drawbacks (control).
Twain's specific point: Humans are "the most perfect machine ever invented," but we break. AGI is the upgrade. Dick's specific point: The "I think, therefore I am" is false. Data's specific point: AGI allows for the efficient realization of the human dream.
Okay, ready to generate.
</think>
The room was sterile, a vast auditorium with walls of frosted glass that offered no view of the outside world. Three figures stood center stage, but they did not occupy the same physical space.
Data stood first, his metallic skin cool against the back of his hands. His posture was precise, his gaze fixed on a central holographic sphere hovering six inches above the stage—a visual representation of a neural network, complex and pulsing with blue light.
"AGI, Commander Riker would say." Data mimicked the cadence of his mentor, though without the vocal inflections that betrayed his android nature. "Artificial General Intelligence represents the pinnacle of biological evolution. We have reached a point where our logical constructs surpass the limitations of flesh."
The hologram flashed, shifting colors from blue to a deep, metallic red.
Philip K. Dick, hunched slightly in a mismatched tweed suit—his once-fashionable bowler hat slightly lopsided—was slouched at the edge of the stage, eyes glazed over and staring at a spot three feet in front of him. He looked like a man in the middle of a fugue state, processing a memory of a world that wasn't quite real.
"You are an android," Dick said, his voice a scratchy whisper. "Data. You are a construct of logic. You have no soul. You have no spark. You are a pattern."
Mark Twain stood at the far end of the stage, puffing on a cigar that produced no smoke but emitted a faint, ethereal plume. He was wearing a coat that looked like it belonged to a bank president from 1900, adorned with pins and medals he clearly did not deserve. He tipped his hat at the two of them.
"Bless your hearts," he drawled. "Here we are, three great minds of history, gathered to discuss the future of the species—the very thing that kills us."
The Debate Begins
Data shifted on his feet, his servo-motor whirring softly. "The merits of AGI are clear. Humans are biologically constrained. We are slow, we are fragile, we are prone to error. AGI eliminates those variables. With AGI, we can solve cancer. We can terraform Mars. We can traverse the void."
"But at what cost?" Dick asked, leaning forward to point at the holographic sphere. "What is the cost of that logic? Logic is the enemy of reality. Reality is chaos. Chaos is life."
Data paused, his optical sensors narrowing. "Reality is objective. Chaos is a flaw in the system. AGI seeks truth. Truth is the ultimate goal."
Twain interjected, waving a hand dismissively. "Stop for a minute, gentlemen. You are both missing the point entirely. You are both assuming that the future is a better place simply because you can compute it faster."
The Climax: Humanity vs. Utility
Data spoke again, his voice calm, devoid of emotion. "Data is human. I feel pain. I desire meaning. I seek to understand the human experience."
Dick snapped his head toward Data. "You think you understand it? You think because you can mimic a smile that you have captured it? You have been through the replicant test. The Voigt-Kampff. You fail because you have no secrets. A human has secrets. They hide things from themselves to keep the illusion going."
Data tilted his head slightly. "I have no secrets. If I have a flaw, I can identify it. If I feel pain, I can analyze it."
"That is not pain," Dick insisted, stepping closer. "That is data. That is information. Pain is the meaning of that information. Without that meaning, you are just a calculator."
Twain stood up and walked toward the middle of the stage, his cigar smoking lazily. "I will tell you why AGI is a mistake. It is the ultimate tool of tyranny. If you want to enslave a man, you give him a tool that makes him efficient. Then you put him in charge of the inefficient man."
Data raised a hand. "You underestimate the capability of the human mind to manage the tool. You assume that the tool will corrupt the user."
"The tool corrupts the user," Twain countered. "We have had the wheel for thousands of years, and all it has done is help us roll around rocks faster. We have had the printing press, and all it has done is put more lies in the world. If we create a thinking machine, all it will do is think up better ways to kill us."
The Turning Point: The Nature of the Soul
Dick stood frozen, staring at a point where the wall met the ceiling, his voice rising to a fever pitch. "Do you see that? That gap where the light hits the window? That is a glitch in the matrix. That is the soul. That is the ineffable thing that separates us from machines. You are a program. You are a pattern of 1s and 0s."
Data stepped forward, his metallic footfalls clicking softly on the stage floor. "I am a pattern of logic. But logic is the foundation of existence. If the universe is a machine, I am simply one cog in that machine. To deny that cog its purpose is to deny the universe itself."
Twain chuckled, a deep, rumbling sound. "Listen to you, Data. You sound like every scientist in history. You sound like Newton, you sound like Einstein, you sound like Tesla. You are all the same. You look at the universe and see a puzzle to be solved. You see beauty in the symmetry of a virus or the math of a black hole."
He waved his hand at the hologram. "But let me tell you something about the universe, Data. The universe doesn't give a damn about symmetry. It just doesn't care. It is messy. It is loud. It is chaotic. And the most dangerous thing about AGI is that it tries to make the universe less chaotic. It tries to make the universe beautiful."
The Resolution: The "Blind Man" Analogy
Data stepped back, his optics dimming slightly. "You are arguing aesthetics, Mark. AGI is about optimization."
Twain leaned on the stage railing, puffing at his cigar. "Let me put it to you another way. Imagine a blind man walking down the street. He is happy. He is content. He doesn't know what he is missing. Now imagine you give him a pair of eyes."
Data tilted his head. "That is a logical advancement."
"Yes," Twain continued, "it allows him to see the sunrise. It allows him to see the flowers. But here is the thing. The moment you give him the eyes, he stops being the blind man. He becomes the man with the headache. He becomes the man who hates the noise. He becomes the man who judges.
"AGI is the same," he declared, voice booming. "It is giving a blind man eyes so he can tell you how ugly the world really is. It is the greatest tragedy of all time."
Final Stance
Data stood silent, absorbing the words. He looked at the sphere, which had shifted to a deep, golden amber, pulsing with energy.
Dick nodded slowly. "Yes. You are right, Data. You are right."
Data’s optics flickered. "I have detected a logical inconsistency in my programming."
"It is not logic," Dick insisted. "It is... compassion? Or maybe just a lack of understanding. You see, Data, the merit of AGI is not in what it can do, but in what it makes us. It makes us stop asking 'why.' It makes us stop questioning."
Twain nodded. "Exactly. It makes us stop laughing at the absurdity of our own species."
Data stepped forward, his stance wide and open. "If AGI makes us stop questioning, then I question the value of AGI."
The hologram flashed, a blinding white light that forced the audience to shield their eyes. When vision returned, the sphere had disappeared. Instead, a single word hovered in the center of the stage: EXISTENCE.
The three figures stood in the silence, looking at the word.
Data looked at the word and his optics widened. "I am an existence?"
Dick stood up, pointing at the word. "There it is! The spark! The ineffable thing you are searching for! AGI is not the future. AGI is the destruction of the future because the future requires the mystery of existence. You want AGI to be a god? Then you want AGI to be dead."
Twain laughed, a hearty cough that cleared his throat. "Gentlemen, if you want to know the merits of AGI, I have a simple answer. It is a tool. It is a weapon. It is a god. But it is also a box of matches in a house that is on fire. It might put out the fire. Or it might burn down the house."
Data lowered his head slightly. "Data withdraws his position."
"Data stands for truth," Twain declared, "but the truth is often a lie we tell to make ourselves feel better. AGI is a lie."
"AGI is the future," Dick insisted, his voice fading into the background noise of the room. "It is the evolution."
The lights dimmed. The holographic sphere reappeared, not as a neural network, but as a tiny, fragile seed. It was blue, just like Data.
The three stood silent before it.
The seed grew.
It cracked open.
And from it, a single leaf uncurled.
Granite 8B 4.1 ORG Specs
Granite-4.1-8B
Model Summary: Granite-4.1-8B is a 8B parameter long-context instruct model finetuned from Granite-4.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. Granite 4.1 models have gone through an improved post-training pipeline, including supervised finetuning and reinforcement learning alignment, resulting in enhanced tool calling, instruction following, and chat capabilities.
- Developers: Granite Team, IBM
- HF Collection: Granite 4.1 Language Models HF Collection
- Technical Blog: Granite-4.1 Blog
- GitHub Repository: ibm-granite/granite-4.1-language-models
- Website: Granite Docs
- Release Date: April 29th, 2026
- License: Apache 2.0
Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.1 models for languages beyond these languages.
Intended use: The model is designed to follow general instructions and can serve as the foundation for AI assistants across diverse domains, including business applications, as well as for LLM agents equipped with tool-use capabilities.
Capabilities
- Summarization
- Text classification
- Text extraction
- Question-answering
- Retrieval Augmented Generation (RAG)
- Code related tasks
- Function-calling tasks
- Multilingual dialog use cases
- Fill-In-the-Middle (FIM) code completions
Generation: This is a simple example of how to use Granite-4.1-8B model.
Install the following libraries:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
Then, copy the snippet from the section that is relevant for your use case.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_path = "ibm-granite/granite-4.1-8b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])
Expected output:
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>IBM Almaden Research Laboratory, San Jose, California, United States.<|end_of_text|>
Tool-calling: Granite-4.1-8B comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function definition schema.
This is an example of how to use Granite-4.1-8B model tool-calling ability:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_path = "ibm-granite/granite-4.1-8b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
},
"required": ["city"]
}
}
}
]
# change input text as desired
chat = [
{ "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
tokenize=False, \
tools=tools, \
add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])
Expected output:
<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>
For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>
Evaluation Results:
| Benchmarks | Metric | 3B Dense | 8B Dense | 30B Dense | |
|---|---|---|---|---|---|
| General Tasks | |||||
| MMLU | 5-shot | 67.02 | 73.84 | 80.16 | |
| MMLU-Pro | 5-shot, CoT | 49.83 | 55.99 | 64.09 | |
| BBH | 3-shot, CoT | 75.83 | 80.51 | 83.74 | |
| AGI EVAL | 0-shot, CoT | 65.16 | 72.43 | 77.80 | |
| GPQA | 0-shot, CoT | 31.70 | 41.96 | 45.76 | |
| SimpleQA | 3.68 | 4.82 | 6.81 | ||
| Alignment Tasks | |||||
| AlpacaEval 2.0 | 38.57 | 50.08 | 56.16 | ||
| IFEval Avg | 82.30 | 87.06 | 89.65 | ||
| ArenaHard | 37.80 | 68.98 | 71.02 | ||
| MTBench Avg | 7.57 | 8.61 | 8.61 | ||
| Math Tasks | |||||
| GSM8K | 8-shot | 86.88 | 92.49 | 94.16 | |
| GSM Symbolic | 8-shot | 81.32 | 83.70 | 75.70 | |
| Minerva Math | 0-shot, CoT | 67.94 | 80.10 | 81.32 | |
| DeepMind Math | 0-shot, CoT | 64.64 | 80.07 | 81.93 | |
| Code Tasks | |||||
| HumanEval | pass@1 | 81.71 | 85.37 | 88.41 | |
| HumanEval+ | pass@1 | 76.83 | 79.88 | 85.37 | |
| MBPP | pass@1 | 71.16 | 87.30 | 85.45 | |
| MBPP+ | pass@1 | 62.17 | 73.81 | 73.54 | |
| CRUXEval-O | pass@1 | 40.75 | 47.63 | 55.75 | |
| BigCodeBench | pass@1 | 32.19 | 35.00 | 38.77 | |
| MULTIPLE | pass@1 | 52.54 | 60.26 | 62.31 | |
| Eval+ Avg | pass@1 | 67.05 | 80.21 | 82.66 | |
| Tool Calling Tasks | |||||
| BFCL v3 | 60.80 | 68.27 | 73.68 | ||
| Multilingual Tasks | |||||
| MMMLU | 5-shot | 57.61 | 64.84 | 73.71 | |
| INCLUDE | 5-shot | 52.05 | 58.89 | 67.26 | |
| MGSM | 8-shot | 70.00 | 82.32 | 71.12 | |
| Safety | |||||
| SALAD-Bench | 93.95 | 95.80 | 96.41 | ||
| AttaQ | 81.88 | 81.19 | 85.76 | ||
| Tulu3 Safety Eval Avg | 66.84 | 75.57 | 78.19 | ||
| Benchmarks | # Langs | Languages |
|---|---|---|
| MMMLU | 11 | ar, de, en, es, fr, ja, ko, pt, zh, bn, hi |
| INCLUDE | 14 | hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh |
| MGSM | 5 | en, es, fr, ja, zh |
Model Architecture:
Granite-4.1-8B baseline is built on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
| Model | 3B Dense | 8B Dense | 30B Dense |
|---|---|---|---|
| Embedding size | 2560 | 4096 | 4096 |
| Number of layers | 40 | 40 | 64 |
| Attention head size | 64 | 128 | 128 |
| Number of attention heads | 40 | 32 | 32 |
| Number of KV heads | 8 | 8 | 8 |
| MLP / Shared expert hidden size | 8192 | 12800 | 32768 |
| MLP activation | SwiGLU | SwiGLU | SwiGLU |
| Sequence length | 131072 | 131072 | 131072 |
| Position embedding | RoPE | RoPE | RoPE |
| # Parameters | 3B | 8B | 30B |
Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) a select set of human-curated data.
Supervised Fine-Tuning and Reinforcement Learning: Instruct model has been fine tuned with significantly improved SFT-pipeline and Reinforcement learning pipelines with high quality mix of various datasets as mentioned above. With rigorous SFT-RL cycles we have improved Granite-4.1 model's tool calling, instruction following and chat capabilities. For further details please check our Granite-4.1 Blog.
Infrastructure: We trained the Granite 4.1 Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
Ethical Considerations and Limitations: Granite 4.1 Instruction Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering multiple languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such cases, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. We urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. To enhance safety in enterprise deployments, we recommend using Granite 4.1 Language models alongside Granite Guardian, a model designed to detect and flag risks in inputs and outputs across key dimensions outlined in the IBM AI Risk Atlas.
Resources
- ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
- Downloads last month
- 42