A research article published earlier this year, titled "Contextual feature extraction hierarchies converge in large language models and the brain," was successfully published in a sub-journal of the top international journal Nature after nearly a year of academic review in November this year.
By using intracranial electrophysiological recordings (intracranial electroencephalography, iEEG) from neurosurgical patients while they listened to speech, the study delved into the alignment relationship between LLMs and human brain language processing mechanisms.
Research highlights
As LLMs improve their performance on benchmark tasks, they not only become more "brain-like" in accurately predicting neural responses but also demonstrate higher consistency with the hierarchical feature extraction pathways of the human brain—achieving efficient processing with fewer layers in the same encoding task.
High-performing LLMs exhibit a shared strategy for hierarchical language processing, indicating a trend towards convergence on similar language processing mechanisms at a functional level.
The role of contextual information is critical for both the performance of LLMs and their alignment with human brain activity. This finding underscores the central role of context processing in language understanding.
Experimental methodology
centered around each word, which served as the word response for each electrode.
to serve as the word representation of the model.
to quantify the degree of correspondence between LLM representations and neural responses.
Research findings
1. Brain-related consistency and LLM performance consistency (Figure A)
The average of brain-related consistency across all electrodes shows a high degree of alignment between the performance of LLMs and their neural alignment. It can be seen from the figure that models with poorer performance (blue/purple) have lower average correlations, while models with better performance (yellow) have higher correlations. The shaded area represents the standard error between electrodes, showing a consistent trend.
2. Peak correlation and LLM performance (Figure B)
The peak correlation for each model was calculated based on the highest correlation score across all layers for each electrode, then averaged over all electrodes. The results showed a significant positive correlation between average peak correlation and LLM performance (Pearson r = 0.92, p = 2.24 × 10⁻⁵). : The significance level of the results is indicated using asterisks (
3. The relationship between the distribution of peak correlation layers and distance (Figure C)
The study calculated the locally smoothed peak hierarchical distribution based on the distance of electrodes from the posterior part of the superior temporal gyrus (pmHG). The results showed that the peak hierarchy gradually increased with the distance of electrodes from pmHG. Better-performing models (yellow) peaked at positions closer to the lower layers of the model, showing a lower distribution compared to poorer-performing models (blue/purple).
4. The average peak hierarchy and model performance (Figure D)
By analyzing the average peak layer across all electrodes for each model, we found that it is significantly negatively correlated with LLM performance (Pearson r = −0.81, p = 0.0013). This suggests that better-performing models tend to reach the highest brain correlation at lower layers (closer to the input).
Key findings
The research results indicate that the performance of large language models not only affects their performance in language tasks but also directly determines the degree of alignment with the human brain's language processing mechanisms.
Models with better performance reach peaks of brain-related relevance at lower levels, implying that high-performance LLMs have more effective feature extraction capabilities and hierarchical processing mechanisms closer to the human brain.
The Importance of Contextual Information: A New Perspective on the Alignment between Large Language Models and the Brain
reveals how the length and content of the context window affect model performance and neural alignment.
1. The impact of the context window on hierarchical alignment (Figure A)
influence. * indicates ** indicates *** indicates The results show that, Each point in the figure represents a correlation result, with 95% confidence intervals indicated by error bars, and significance levels marked by asterisks (∗, ∗∗, ∗∗∗):
2. The relationship between context content and model performance (Figure B)
There is a significant positive correlation between the influence of context content on model representation and its benchmark evaluation performance (Spearman r = 0.66, and enhance their feature representation capabilities.
3. The relationship between context content and brain similarity (Figure C)
There was also a significant positive correlation between the model's contextual content and its average peak brain similarity (Spearman r = 0.84, The horizontal error bars in the figure indicate the standard error of the mean brain similarity across electrodes, further emphasizing the central role of context processing in brain alignment.
4. Distribution of regional context effects (Figures D and E)
In the brain maps drawn on the FreeSurfer platform, electrodes were colored according to their influence on the peak brain similarity, showing the distribution of context effects across different brain regions.
The contextual effect differences of four major language-related brain regions were analyzed, and a bar chart was created to display the average contextual effect values for each region:
The color of each bar is consistent with the brain map's color scheme, and error bars represent the standard error. : The differences between regions were analyzed using the Wilcoxon rank-sum test, and the significance levels are indicated by asterisks (∗, ∗∗, ∗∗∗).
Meaning
revealing the central role of contextual information in language understanding and processing.
High-performance LLMs can better utilize contextual information, not only improving their task performance but also aligning more closely with the cognitive mechanisms of the human brain. This provides important inspiration for the future optimization of models.
Different brain regions show varying responses to contextual information, indicating the hierarchical nature of language processing and the synergistic functions of regional areas.