Deep learning has garnered significant interest from researchers for performing pattern recognition tasks. In particular, the detection of events based on audio signals and the recognition of natural sounds in the environment stand out. The DCASE challenge – Detection and Classification of Acoustic Scenes and Events – has further highlighted the efficiency of deep learning in accomplishing these tasks. This paper reviews the works of other researchers that applied various deep learning techniques to detect emergency events based on audio signals. It focuses on the complexity and specific challenges of recognizing polyphonic sound-based events. The use and structures of neural networks are presented, with an emphasis on the application of CNN and RNN for event detection based on audio signals. Evaluation metrics and an overview of datasets are also provided.