Introduction Caregiver stress negatively influences both patients and caregivers. Predictors of caregiver difficulty may provide crucial insights for providers to prioritize those with the highest risk of stress. The purpose of this study was to develop a prediction model of caregiver difficulty by applying data mining techniques to a national behavioral risk factor data set. Methods Behavioral data including 397 variables on 2,264 informal caregivers, who provided any care to a friend or family member during the past month, were extracted from a publicly available national dataset in the U.S (N = 451,075) and analyzed. We applied several classification algorithms (J48, RandomForest, MultilayerPerceptron, AdaboostM1), to iteratively generate prediction models for caregiving difficulty with 10-fold cross validation. Results 44.7% of informal caregivers answered that they faced the greatest difficulties while they took care of patients. Among those who faced the greatest difficulties, the reasons were creating emotional burden (45%). Patient cognitive alteration (e.g. cognitive changes in thinking or remembering during the past year), care hours, and relationship with a caregiver appeared as the main predictors of caregiver stress (classified correctly 63%, difficulty AUC = 65%, no difficulty AUC = 65%). Conclusions Data mining methods were useful to discover new behavioral risk knowledge and to visualize predictors of caregiver stress from a multidimensional behavioral dataset.This study suggests that health professionals target dementia family caregivers who are anticipated to experience patients' neuro-cognitive changes, and inform the caregivers about importance of limiting care hours, burn out and delegation of caregiving tasks.