Addressing the global challenge of traffic crashes necessitates transcending traditional statistical models, which often fail to fully capture the interactions between factors causing crashes. This oversight restricts the predictive accuracy and adaptability of current methodologies. Additionally, there is a notable gap in research that examines the links between behavior-cause relationships and crash injury severity. Our study deploys Natural Language Processing (NLP) and Frequent Pattern (FP) growth algorithm to mine crash narratives for behavior-cause connections, combines with the predictive strength of eXtreme Gradient Boosting (XGBoost) and the interpretative clarity offered by SHapley Additive exPlanations (SHAP), our approach not only predicts crash injury severity with satisfactory precision but also explains the influence of specific behavior-cause and environment conditions on crash outcomes. The integration of NLP and XGBoost, complemented by SHAP insights, has shown promising results with an accuracy of 0.79, outperforming traditional discrete choice models and competes closely with other machine learning approaches, including Support Vector Machines, Random Forest, Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM). Through detailed textual analysis and the establishment of a behavior-cause matrix, identifying five broad crash causes linked to 141 specific crash cause with behaviors, we uncover critical patterns such as the prominence of distracted driving in severe crashes. This comprehensive approach not only fills a critical research gap by linking behavior-cause relationships with injury severity but also sets the stage for developing targeted interventions to enhance road safety.