Rating a video based on its content is one of the most important solutions to classify videos for audience age groups. In this regard, Film content rating and TV programmes rating are the only two most common rating systems which have been accomplished by the professional committees. However, due to the huge number of short videos shared in social media, it is impossible to review and rate their contents manually by a committee. Therefore, a proper solution is by utilizing computer vision capabilities to analyze the video content and rate it. Automatic Video Content Rating (VCR) system rates a short video to classify it for audience age groups. Inspired by the current manually film and TV programmes rating systems, VCR depends on five main components that comprise violence, profanity language, nudity, pornography, and substance abuse. To date, several reviews and survey papers have addressed advancements and innovations in video content analysis such as violence, nudity, and pornography detection. However, the lack of a comprehensive survey paper to investigate a VCR system and explain taxonomy, challenges, and open issues is discovered; thus, this study is undertaken. In this paper, in addition, to fill this gap, we review deep learning studies related to the relevant subjects of VCR. Moreover, we have investigated recently published works related to VCR based on the audio, static visual and, motion visual aspects of a video. Furthermore, related current datasets are investigated as well as the performances of published models in these datasets are compared. Finally, the challenges and the future of VCR are discussed.