Abstract. To what extent an individual or group will be affected by the damage of a hazard depends not just on their exposure to the event but on their social vulnerability – that is, how well they are able to anticipate, cope with, resist, and recover from the impact of a hazard. Therefore, for mitigating disaster risk effectively and building a disaster-resilient society to natural hazards, it is essential that policy makers develop an understanding of social vulnerability. This study aims to propose an optimal predictive model that allows decision makers to identify households with high social vulnerability by using a number of easily accessible household variables. In order to develop such a model, we rely on a large dataset comprising a household survey (n = 41 093) that was conducted to generate a social vulnerability index (SoVI) in Istanbul, Türkiye. In this study, we assessed the predictive ability of socio-economic, socio-demographic, and housing conditions on the household-level social vulnerability through machine learning models. We used classification and regression tree (CART), random forest (RF), support vector machine (SVM), naïve Bayes (NB), artificial neural network (ANN), k-nearest neighbours (KNNs), and logistic regression to classify households with respect to their social vulnerability level, which was used as the outcome of these models. Due to the disparity of class size outcome variables, subsampling strategies were applied for dealing with imbalanced data. Among these models, ANN was found to have the optimal predictive performance for discriminating households with low and high social vulnerability when random-majority under sampling was applied (area under the curve (AUC): 0.813). The results from the ANN method indicated that lack of social security, living in a squatter house, and job insecurity were among the most important predictors of social vulnerability to hazards. Additionally, the level of education, the ratio of elderly persons in the household, owning a property, household size, ratio of income earners, and savings of the household were found to be associated with social vulnerability. An open-access R Shiny web application was developed to visually display the performance of machine learning (ML) methods, important variables for the classification of households with high and low social vulnerability, and the spatial distribution of the variables across Istanbul neighbourhoods. The machine learning methodology and the findings that we present in this paper can guide decision makers in identifying social vulnerability effectively and hence let them prioritise actions towards vulnerable groups in terms of needs prior to an event of a hazard.