nanochat students

university

https://github.com/karpathy/nanochat

Activity Feed Request to join this org

AI & ML interests

Nanochat, fine-tuning, LLMs, post-training

Recent Activity

rajkumarrawal submitted a paper 5 days ago

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

gagan3012 authored a paper 21 days ago

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

gagan3012 authored a paper 21 days ago

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

View all activity

mahimairaja

posted an update 2 months ago

Post

1192

🔥 Qwen is dominating the SLM space right now.

We all know this year 2026 is the year of Small Models, but Alibaba team took it bit serious it seems!

Qwen3-TTS — 3-sec voice cloning, 10 languages, beats ElevenLabs
Qwen3-ASR — Just dropped TODAY! 52 languages, <8% WER, SOTA open-source ASR
Qwen-Image — #1 open-source image model on AI Arena

All Apache 2.0. The most complete open-source AI stack, period.

So, what do you think now, what next release could be? an Language Model?
Comment below

1 reply

mahimairaja

posted an update 3 months ago

Post

2142

My Favorite Open Source Models for Jan 2026

1. General Use - deepseek-ai/DeepSeek-V3.2
2. Reasoning - deepseek-ai/DeepSeek-V3.2-Speciale
3. Coding - Qwen/Qwen3-Coder-30B-A3B-Instruct
4. OCR - Qwen/Qwen3-VL-8B-Instruct
5. Image Generation - black-forest-labs/FLUX.2-dev
6. Image Editing - Qwen/Qwen-Image-Edit-2509

What model do you use regularly?

4 replies

mahimairaja

posted an update 3 months ago

Post

1511

Lacking vllm support for Transformers v5, frustrating only me?

mahimairaja

posted an update 3 months ago

Post

4786

Happy New Years 2026!

For next 365 days I will be commit to work on:

- Document AI and OCR Automations
- Voice Agents
- Long Running Tasks - Durable Agents

1 reply

csabakecskemeti

posted an update 4 months ago

Post

3320

Just sharing a result of a homelab infrastructure experiment:

I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com

Screen recording:
https://lnkd.in/gKM9H5GJ

3 replies

rezashamji

in nanochat-students/README 4 months ago

Let's Gooooo! Let us know if you're on board.

😎 1

#1 opened 6 months ago by

burtenshaw

csabakecskemeti

posted an update 4 months ago

Post

1417

FYI: Mistral.Ministral-3 dequantizer FP8->BF16

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

(The instruct model weights are in FP8)

csabakecskemeti

posted an update 4 months ago

Post

2100

Looking for some help to test an INT8 Deepseek 3.2:
SGLang supports Channel wise INT8 quants on CPUs with AMX instructions (Xeon 5 and above AFAIK)
https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/

Currently uploading an INT8 version of Deepseek 3.2 Speciale:
DevQuasar/deepseek-ai.DeepSeek-V3.2-Speciale-Channel-INT8

I cannot test this I'm on AMD
"AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support"
(I assumed it can fall back to some non optimized kernel but seems not)

If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.

If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.

Thanks all!