Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

每日信息看板 · 2026-02-16

返回当天 Daily Index

研究/论文

AI 总结

该论文系统验证了单细胞转录组掩码重建Transformer的缩放定律：在数据充足时损失随模型规模呈幂律下降，而数据稀缺时几乎不缩放，说明数据-参数比是构建高效单细胞基础模型的关键。

针对scRNA-seq首次系统研究掩码重建Transformer的神经缩放规律。
设置两种实验：数据充足（512基因、20万细胞）与数据受限（1024基因、1万细胞）。
评估7种模型规模（533到3.4×10^8参数），并对验证集MSE拟合参数化缩放律。
数据充足条件下出现明显幂律缩放，且存在不可约损失下限 c≈1.44。
数据受限条件下缩放效应微弱，表明瓶颈主要在数据而非模型容量。
将渐近下限初步换算为信息论量约2.30 bit/被mask基因位，指向后续基础模型设计与测量方向。

#arXiv #paper #研究/论文 #scRNA-seq #Transformer

原链接

内容摘录

Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in single-cell transcriptomics when sufficient data are available, and they identify the data-to-parameter ratio as a critical determinant of scaling behaviour. A preliminary conversion of the data-rich asymptotic floor to information-theoretic units yields an estimate of approximately 2.30 bits of entropy per masked gene position. We discuss implications for the design of single-cell foundation models and outline the additional measurements needed to refine this entropy estimate.