Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2

arXiv 15 May 2026 2 min read

AI Insight

This study applies feature visualization, a technique using gradient ascent to generate images that maximally activate a target brain region as predicted by a neural network encoder, to evaluate whether brain encoder models have learned meaningful representations of cortical function. When applied to the TRIBE v2 encoder combined with V-JEPA 2, the method reliably recovers known properties of seven visual cortex regions, including the expected progression of spatial complexity from V1 to V4, motion-related patterns for the middle temporal area, face-like features for the fusiform face area, and geometric patterns for the parahippocampal place area. Notably, optimized stimuli for the fusiform face area elicited approximately four times greater predicted activation than real face photographs, indicating the method produces adversarial super-stimuli rather than representative natural images.

Why it matters

Feature visualization offers a practical and qualitatively interpretable complement to prediction accuracy for assessing brain encoder models, potentially helping researchers identify whether such models genuinely capture the functional logic of cortical organization rather than merely fitting data statistically. This approach could inform both computational neuroscience and the development of more neurologically grounded vision models.

Confidence

5/10Peer-reviewedBiology

arXiv:2605.13904v1 Announce Type: new
Abstract: Brain encoder models predict cortical fMRI responses from the internal activations of pretrained vision and language networks, and are typically evaluated by held-out prediction accuracy. This is a useful signal for training but a poor one for interpretation: it tells us an encoder fits the data without telling us whether it has internalized the functional organization of the brain. We propose feature visualization — gradient ascent on the encoder’s predicted activation for a target region of interest (ROI) — as a complementary interpretability technique, and apply it to TRIBE v2 composed with V-JEPA 2 (ViT-G, 40 layers), holding both frozen and synthesizing still images for seven regions spanning the ventral and dorsal visual hierarchies. Under identical hyperparameters, the probe recovers a visible progression of increasing spatial scale and feature complexity across V1 to V4, matching the ventral-stream hierarchy. It also produces three distinctive downstream regimes: radial “frozen-motion” streaks for the middle temporal area (MT) despite static-only optimization, face-like features for the fusiform face area (FFA), and consistent rectilinear line patterns for the parahippocampal place area (PPA). Optimized FFA stimuli drive the predicted region ~4x as much as a natural face photograph, consistent with feature visualization producing adversarial super-stimuli rather than canonical exemplars. The probe is simple, differentiable, and applicable to any brain encoder with a differentiable backbone, allowing for qualitative evaluation of brain encoders.

Source: Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2

Source
arXiv