From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

arXiv 18 May 2026 2 min read

AI Insight

This study applies Layer-wise Relevance Propagation (LRP), an explainability technique, to EEG foundation models (FMs) built on Transformer architectures, extending its use beyond the convolutional neural networks it was originally designed for. The authors demonstrate that LRP can detect problematic "Clever Hans" behavior — where models exploit spurious correlates (eye movement signals) rather than genuine motor-related brain activity during motor imagery tasks. Additionally, in an emotion/affect prediction paradigm, LRP revealed a consistent reliance on a central electrode cluster, pointing to a potential sensorimotor signature of arousal that warrants further neuroscientific investigation.

Why it matters

As EEG foundation models move toward clinical use in diagnostics and brain-computer interfaces, interpretability tools like LRP are critical for validating model decisions and ensuring they reflect genuine neurological signals rather than artifacts. This work provides a practical framework for both quality control and hypothesis generation in neuroscience and medical AI.

Confidence

6/10Peer-reviewedBiology

arXiv:2605.11885v1 Announce Type: cross
Abstract: Emerging foundation models (FMs) in electroencephalography (EEG) promise a path to scale deep learning in diagnostics and brain-computer interfaces despite data scarcity, yet their opaque nature remains a barrier to wider adoption. We investigate attention-aware Layer-wise relevance propagation (LRP) as a post-hoc attribution method for EEG-FMs, extending LRP’s use on convolutional neural network (CNN)-based EEG models to the Transformer architectures that current FMs are based on. We find that LRP can both verify EEG-FM decisions and surface novel, biologically plausible hypotheses from them. In motor imagery, it unmasks ‘Clever Hans’ behavior where models prioritize task correlated ocular signals over the intended motor correlates. In a naturalistic paradigm for affect prediction, it reveals a recurring reliance on a central electrode cluster, suggesting a candidate sensorimotor signature of arousal. Though heatmap interpretation remains ambiguous in this complex domain, the results position LRP as a tool for both verification and exploration of EEG-FMs, a role that will grow in both importance and discovery potential as the underlying models mature.

Source: From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

Source
arXiv