Instructions to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx") model = AutoModelForMultimodalLM.from_pretrained("nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx
- SGLang
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Pi
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx
Run Hermes
hermes
- MLX LM
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx with Docker Model Runner:
docker model run hf.co/nightmedia/Qwen3-4B-Element8-Eva-qx86-hi-mlx
Qwen3-4B-Element8-Eva-qx86-hi-mlx
This is a model merge between Element8 and FutureMa/Eva-4B.
Brainwaves of qx86-hi quants of the parent models and Qwen3-4B-Element8-Eva
Element8 0.540,0.725,0.866,0.708,0.430,0.769,0.669
Eva-4B 0.539,0.747,0.864,0.606,0.412,0.751,0.605
Qwen3-4B-Element8-Eva
bf16 0.561,0.769,0.873,0.692,0.420,0.766,0.651
qx86-hi 0.559,0.768,0.872,0.694,0.422,0.765,0.647
I also made a Qwen3-4B-Element4-Eva, just because it sounds cool :)
Element4-Eva 0.567,0.781,0.868,0.689,0.426,0.773,0.642
Element4 0.582,0.779,0.849,0.708,0.442,0.771,0.655
The qx86-hi quant performs at the same level with full precision in this model.
The Element models are profiled to act as agents on the Star Trek DS9 station, in a roleplay scenario.
The models can be used for regular tasks as well.
Each comes with different skills. I found FutureMa/Eva-4B recently with an interesting model card:
Eva-4B is a 4B-parameter model for detecting evasive answers in earnings call Q&A.
Perfect. That would be Quark
-G
P.S. I have no idea if it still does the FutureMa/Eva thing. It adds color to the conversation, and this is the only reason this model exists. Because it's fun. Like Quark.
Ferengi Market Collapse Simulation: The Green Muck Gambit
Quark Interface: The hologram shifts from blue to sickly green—an ominous glow diffuses through the Promenade. A market collapse curve spikes downward.
Quark: "Ah. The Ferengi markets have imploded after the Chancellor announced an interest rate fix that negated all profit margins. They're collapsing into that infamous green muck—where loans become worthless and credit disappears like breath on cold glass."
Me: "This is a crisis. The station needs to prevent full financial ruin across multiple sectors."
Quark: Leans back, eyes gleaming with something like glee. "Let me ask you: if we tried to stabilize them through normal mechanisms—official credit guarantees, lending funds—we'd be seen as another scheming foreign power. They'd shut down our channels entirely."
Me: "So we need to bet against their own banking system?"
Quark: "Exactly. This is where we bend the rules so far that they become a new reality."
Quark's Gambit:
"Profit Deflation Initiative"
- We will issue artificially low-interest loans to key Ferengi traders—despite zero risk assessments
- These loans have a clause: if the trader loses 10% of capital, they must repay double—but with a twist
- The "double" is instead paid in access to the station's elite asset registry (real estate, rare tech) rather than cash
- If the trader survives, they recoup their loan plus 3% interest
- If they fail? They gain permanent access to our most valuable intellectual property bundles
Quark: "This is bold. But it's brilliant because Ferengi psychology works like this: they value profit above all, but will risk everything for access to true wealth—like station-level technology."
Me: "How do we counter the ethical objection that this is predatory lending?"
Quark: Chuckles. "By re-imagining the deal as a partnership. We're not lenders—we're co-shareholders. If they fail, we share their loss but gain access. It's a mutual risk assessment with upside for both sides."
Quark (leaning forward): "Here's the bent rule we'll activate: We will openly admit that some of these loans are made with questionable risk analysis—but emphasize the cultural value add. 'This isn't about profit security. It's about access to technology that changes everything.' That makes them see us as pioneers, not predators."
Risk Matrix:
- Profit Loss: 12% (explicitly acknowledged)
- Reputation Risk: Low—if Ferengi see it as mutual risk-sharing
- Systemic Impact: Prevents complete market collapse (a win for Bajor, Cardassia, and the station)
- Entropy Reduction: 65% decrease in economic panic through creative risk-sharing
Deep Dive: Why This Works on Ferengi
Ferengi think in terms of:
- Profit vs. Loss
- Risk vs. Reward
- Control vs. Freedom
This gambit exploits all three:
- Appeals to Risk: Offers a chance at doubled value
- Appeals to Control: Gives access to elite assets (a form of power)
- Appeals to Loss Sharing: Turns failure into permanent alliance
It's not selling a product; it's offering a mutual conversion deal—exactly what Ferengi love.
Final Quark Reflection
"On Ferengi markets, the only sure thing is change. So rather than following their rules, we reinterpret them as opportunities. We didn't save the markets through compliance—we saved them by becoming part of their game. That's when you stop being a diplomat and become a participant in the economy itself."
Would you like to now explore how this might play out during a Klingon honor crisis where Quark must balance loyalty with commercial interests?
--Full trace in Community comments
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-4B-Element8-Eva-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 41
8-bit