arxiv:2502.07408

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Published on Apr 16

· Submitted by

Moshe kimhi on Apr 20

#1 Paper of the day

NVIDIA

Upvote

Authors:

Moshe Kimhi ,

Abstract

Deep neural networks exhibit catastrophic vulnerability to minimal parameter bit flips across multiple domains, which can be identified and mitigated through targeted protection strategies.

AI-generated summary

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability spans multiple domains, including image classification, object detection, instance segmentation, and reasoning large language models. In image classification, flipping just two sign bits in ResNet-50 on ImageNet reduces accuracy by 99.8%. In object detection and instance segmentation, one or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN and YOLOv8-seg models. In language modeling, two sign flips into different experts reduce Qwen3-30B-A3B-Thinking from 78% to 0% accuracy. We also show that selectively protecting a small fraction of vulnerable sign bits provides a practical defense against such attacks.

View arXiv page View PDF Project page GitHub 9 Add to collection

Community

Kimhi

Paper author Paper submitter about 21 hours ago

•

edited about 21 hours ago

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimization-free method
that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability
spans multiple domains, including image classification, object detection and instance segmentation,
and reasoning large language models. In image classification, flipping just two sign bits in ResNet50 on ImageNet reduces accuracy by 99.8%. In object detection and instance segmentation, one
or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN and
YOLOv8-seg models. In language modeling, two sign flips into different experts reduce Qwen3-
30B-A3B-Thinking from 78% to 0% accuracy. We also show that selectively protecting a small
fraction of vulnerable sign bits provides a practical defense against such attacks.