Multivariate lesion-symptom mapping (MLSM) considers lesion information across the entire brain to predict impairments. The strength of this approach is also its weakness-considering many brain features together synergistically can uncover complex brain-behavior relationships but exposes a high-dimensional feature space that a model is expected to learn. Successfully distinguishing between features in this landscape can be difficult for models, particularly in the presence of irrelevant or redundant features. Here, we propose stable multivariate lesion-symptom mapping (sMLSM), which integrates the identification of reliable features with stability selection into conventional MLSM and describe our open-source MATLAB implementation. Usage is showcased with our publicly available dataset of chronic stroke survivors (N=167) and further validated in our independent public acute stroke dataset (N = 1106). We demonstrate that sMLSM eliminates inconsistent features highlighted by MLSM, reduces variation in feature weights, enables the model to learn more complex patterns of brain damage, and improves model accuracy for predicting aphasia severity in a way that tends to be robust regarding the choice of parameters for identifying reliable features. Critically, sMLSM more consistently outperforms predictions based on lesion size alone. This advantage is evident starting at modest sample sizes (N>75). Spatial distribution of feature importance is different in sMLSM, which highlights the features identified by univariate lesion symptom mapping while also implicating select regions emphasized by MLSM. Beyond improved prediction accuracy, sMLSM can offer deeper insight into reliable biomarkers of impairment, informing our understanding of neurobiology.
Read full abstract