RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Paper • 2511.09554 • Published • 9
This model was converted to MLX format from RF-DETR (ICLR 2026) using mlx-vlm version 0.4.3.
pip install -U mlx-vlm
from pathlib import Path
from PIL import Image
from mlx_vlm.utils import load_model
from mlx_vlm.models.rfdetr.processing_rfdetr import RFDETRProcessor
from mlx_vlm.models.rfdetr.generate import RFDETRPredictor
model = load_model(Path("mlx-community/rfdetr-seg-xlarge-fp32"))
processor = RFDETRProcessor.from_pretrained("mlx-community/rfdetr-seg-xlarge-fp32")
predictor = RFDETRPredictor(model, processor, score_threshold=0.3, nms_threshold=0.5)
result = predictor.predict(Image.open("image.jpg"))
python -m mlx_vlm.models.rfdetr.generate --task segment --image photo.jpg --model mlx-community/rfdetr-seg-xlarge-fp32
python -m mlx_vlm.models.rfdetr.generate --task track --video input.mp4 --model mlx-community/rfdetr-seg-xlarge-fp32
python -m mlx_vlm.models.rfdetr.generate --task realtime --model mlx-community/rfdetr-seg-xlarge-fp32
| Architecture | DINOv2-small + C2f + 6-layer decoder + 6-block seg head |
| Task | Object detection + instance segmentation (COCO 80 classes) |
| Parameters | ~38M |
| Input resolution | 624x624 |
| Mask resolution | 156x156 |
| Dtype | float32 |
| Model | Resolution | Masks | Use case |
|---|---|---|---|
| rfdetr-base-fp32 | 560px | No | Fast detection |
| rfdetr-seg-small-fp32 | 384px | 96x96 | Realtime segmentation |
| rfdetr-seg-large-fp32 | 504px | 126x126 | Better masks |
| rfdetr-seg-xlarge-fp32 | 624px | 156x156 | High quality |
| rfdetr-seg-2xlarge-fp32 | 768px | 192x192 | Best quality |
Quantized