Past events have revealed that widespread blackouts are mostly a result of cascading failures in the power grid. Understanding the underlining mechanisms of cascading failures can help in developing strategies to minimize the risk of such events. Moreover, a real-time detection of precursors to cascading failures will help operators take measures to prevent their propagation. Currently, the well-established probabilistic and physics-based models of cascading failures offer low computational efficiency, hindering them to be used only as offline tools. In this work, we develop a data-driven methodology for online estimation of the risk of cascading failures. We utilize a physics-based cascading failure model to generate a cascading failure dataset considering different operating conditions and failure scenarios, thus obtaining a sample space covering a large set of power grid states that are labeled as safe or unsafe. We use the synthetic data to train deep learning architectures, namely Feed-forward Neural Networks (FNN) and Graph Neural Networks (GNN). With the development of GNNs, improved performance is achieved with graph-structured data, and GNNs can generalize to graphs of diverse sizes. A comparison between FNN and GNN is made and the GNNs inductive capability is demonstrated via test grids. Furthermore, we apply transfer learning to improve the performance of a pre-trained GNN model on power grids not seen in the training process. The GNN model shows accuracy and balanced accuracy above 96% on selected test datasets not used in the training. Conversely, the FNN shows accuracy above 85% and balanced accuracy above 81% on test datasets unseen during training. Overall, the GNN model is successful in determining, if one or several simultaneous outages result in a critical grid state, under specific grid operating conditions.