AI Insight
This study reveals that large brain foundation models (BFMs) trained on fMRI data perform worse at predicting cognitive performance than simple linear regression models based on functional connectivity matrices, despite having billions of parameters. The researchers found that BFMs capture second-order variance structures in brain imaging data but fail to preserve third-order statistics (co-skewness) that are crucial for predicting cognitive abilities. A computationally simple linear method that preserves co-skewness outperformed all tested BFMs without requiring pretraining or GPU resources.
Why it matters
This research challenges the assumption that larger, more complex AI models are always better for neuroscience applications and demonstrates that overlooking higher-order statistical structures can undermine model performance. The findings suggest that efficient, theoretically-motivated approaches may be more effective than resource-intensive foundation models for predicting cognition from brain imaging data.
arXiv:2606.04010v1 Announce Type: new
Abstract: Brain foundation models (BFMs) are self-supervised Transformers pretrained on fMRI data. We posit that these models should capture each subject’s cognitive performance from their fMRI signal. Yet across three state-of-the-art BFMs and every readout we test, they predict cognition worse than a linear regression from the $sim$80K parameters of the functional connectivity matrix (FC). The gap widens with scale: BrainLM’s 650M model predicts cognition worse than its 111M. We attribute this to a textbf{variance allocation problem}: BFM pretraining captures the variance components that dominate fMRI but not the higher-order structure that predicts cognition. Our per-cumulant analysis of the reconstructed signal shows that the second-order covariance is partially preserved, while the third-order co-skewness tensor is largely destroyed. To recover what BFMs lose, we design a linear pipeline that projects the fMRI signal into the subspace that best preserves its co-skewness and computes FC there. This textbf{exceeds raw FC and every pretrained BFM} on every dataset and parcellation we test, outperforming prior state-of-the-art under controlled evaluation textbf{with no pretraining and no GPU}. We textbf{recover the raw-FC ceiling on BrainLM’s forward pass} by finetuning with a loss targeted at this same subspace. This shows that the bottleneck is the pretraining objective, not the architecture or the model size.