TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation
Abstract
TwinTrack framework addresses pancreatic cancer segmentation ambiguity through post-hoc calibration of ensemble probabilities to empirical mean human response, improving calibration metrics on multi-rater benchmarks.
Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities to the empirical mean human response (MHR) -the fraction of expert annotators labeling a voxel as tumor. Calibrated probabilities are thus directly interpretable as the expected proportion of annotators assigning the tumor label, explicitly modeling inter-rater disagreement. The proposed post-hoc calibration procedure is simple and requires only a small multi-rater calibration set. It consistently improves calibration metrics over standard approaches when evaluated on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark.
Community
Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and its segmentation on contrast-enhanced CT is fundamentally ambiguous: when experts disagree, that disagreement often reflects real uncertainty rather than annotation noise. TwinTrack is a simple post-hoc multi-rater calibration method that transforms ensemble segmentation probabilities into predictions aligned with the Mean Human Response, better capturing expert disagreement. In other words: not just better segmentation, but better-calibrated uncertainty for genuinely ambiguous clinical images.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net (2026)
- SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation (2026)
- Rethinking Uncertainty in Segmentation: From Estimation to Decision (2026)
- Volumetric Directional Diffusion: Anchoring Uncertainty Quantification in Anatomical Consensus for Ambiguous Medical Image Segmentation (2026)
- Foundation Model-guided Iteratively Prompting and Pseudo-Labeling for Partially Labeled Medical Image Segmentation (2026)
- Deep EM with Hierarchical Latent Label Modelling for Multi-Site Prostate Lesion Segmentation (2026)
- Component-Adaptive and Lesion-Level Supervision for Improved Small Structure Segmentation in Brain MRI (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper