Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models
每日信息看板 · 2026-03-05
2026-03-04T13:41:20Z
Published
AI 总结
该论文系统分析了多编码器文生图模型(以Stable Diffusion 3为例)的后门风险,提出仅微调不到0.2%参数的MELT攻击仍能高效植入后门,说明现实部署中的多编码器扩散模型并未因规模增大而更安全。
- 研究对象是采用三个文本编码器的Stable Diffusion 3,填补了多编码器场景后门研究不足。
- 作者将后门攻击目标划分为四类,并分析实现各类目标所需的最小编码器组合。
- 提出MELT方法:冻结预训练文本编码器,仅训练低秩适配器(LoRA)实施后门注入。
- 实验显示仅需调优少于编码器总参数0.2%即可实现有效攻击,兼具轻量与攻击成功率。
- 结论指出多编码器带来的参数规模增长并未自然抵御文本编码器层面的后门威胁。
#arXiv #paper #研究/论文 #Stable Diffusion 3
内容摘录
As text-to-image diffusion models become increasingly deployed in real-world applications, concerns about backdoor attacks have gained significant attention. Prior work on text-based backdoor attacks has largely focused on diffusion models conditioned on a single lightweight text encoder. However, more recent diffusion models that incorporate multiple large-scale text encoders remain underexplored in this context. Given the substantially increased number of trainable parameters introduced by multiple text encoders, an important question is whether backdoor attacks can remain both efficient and effective in such settings. In this work, we study Stable Diffusion 3, which uses three distinct text encoders and has not yet been systematically analyzed for text-encoder-based backdoor vulnerabilities. To understand the role of text encoders in backdoor attacks, we define four categories of attack targets and identify the minimal sets of encoders required to achieve effective performance for each attack objective. Based on this, we further propose Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters while keeping the pretrained text encoder weight frozen. We demonstrate that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks on Stable Diffusion 3, revealing previously underexplored vulnerabilities in practical attack scenarios in multi-encoder settings.