Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!
> Deploy an open model from the Hugging Face Hub on SageMaker AI > Connect the deployed model to Strands Agents > Add built-in and custom tools for tool calling > Expose external capabilities through MCP integration > Bonus: talk to your agent and visualize traces with Gradio
Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!
TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.
🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache 🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them ⚡ Active params isn't the same as memory footprint, especially for sparse architectures 📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident 📚 KV cache can still dominate depending on context length, batch size, and concurrency 🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate 🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving
We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥
Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕
Introducing Unsloth Studio ✨ A new open-source web UI to train and run LLMs.
• Run models locally on Mac, Windows, Linux • Train 500+ models 2x faster with 70% less VRAM • Supports GGUF, vision, audio, embedding models • Auto-create datasets from PDF, CSV, DOCX • Self-healing tool calling and code execution • Compare models side by side + export to GGUF
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥
> 🕒 60-minute single-pass processing, no chunking or stitching > 👤 Customized hotwords to guide recognition on domain-specific content > 📝 Rich transcription: joint ASR + diarization + timestamping in one pass > 🌍 50+ languages with automatic detection and code-switching support > 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API
100,000+ models trained with Unsloth have now been open-sourced on 🤗Hugging Face! 🦥
Here are the most popular ones you can run local: 1. TeichAI - GLM-4.7-Flash distilled from Claude 4.5 Opus (high) 2. Zed - Qwen Coder 7B fine-tuned for stronger coding 3. DavidAU - Llama-3.3-8B distilled from Claude 4.5 Opus (high) 4. huihui - gpt-oss made “abliberated”