AI & ML interests

The AI community building the future.

Recent Activity

Articles

sergiopaniego 
posted an update 3 days ago
tomaarsen 
posted an update 3 days ago
view post
Post
252
🌐 I've just published Sentence Transformers v5.4 to make the project fully multimodal for embeddings and reranking. The release also includes a modular CrossEncoder, and automatic Flash Attention 2 input flattening. Details:

You can now use SentenceTransformer and CrossEncoder with text, images, audio, and video, with the same familiar API. That means you can compute embeddings for an image and a text query using model.encode(), compare them with model.similarity(), and it just works. Models like Qwen3-VL-Embedding-2B and jinaai/jina-reranker-m0 are supported out of the box.

Beyond multimodal, I also fully modularized the CrossEncoder class. It's now a torch.nn.Sequential of composable modules, just like SentenceTransformer has been. This unlocked support for generative rerankers (CausalLM-based models like mxbai-rerank-v2 and the Qwen3 rerankers) via a new LogitScore module, which wasn't possible before without custom code.

Also, Flash Attention 2 now automatically skips padding for text-only inputs. If your batch has a mix of short and long texts, this gives you a nice speedup and lower VRAM usage for free.

I wrote a blog post walking through the multimodal features with practical examples. Check it out if you want to get started, or just point your Agent to the URL: https://huggingface.co/blog/multimodal-sentence-transformers

This release has set up the groundwork for more easily introducing late-interaction models (both text-only and multimodal) into Sentence Transformers in the next major release. I'm looking forward to it!
sergiopaniego 
posted an update 10 days ago
sergiopaniego 
posted an update 12 days ago
view post
Post
1954
TRL is officially an adult 🥳

excited to announce TRL v1.0❗️

head to the blog to see how we got here and what’s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1
  • 2 replies
·
sergiopaniego 
posted an update about 1 month ago
view post
Post
742
ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts

on 4×H100s: 12x longer sequences, 3.7x throughput

learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp
sergiopaniego 
posted an update about 1 month ago
view post
Post
426
We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!

We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.

The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.

The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.

We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.

This survey is step one. The async GRPO trainer for TRL is next!

https://huggingface.co/blog/async-rl-training-landscape