Abstract

Abstract--Is it feasible to accurately understand what is happening at a certain location when we are not physically present without watching video footage. Since we are all currently busy with other tasks, we don't have a lot of time to dedicate to watching the entire film in order to understand what is happening. But, there is another choice for this, namely an audio clip with a person actually narrating the scene. The main advantage of this is we can simultaneously save time and multi-task i.e doing our work by listening to the audio clip that is generated by getting the up-to-date information and also if any person suddenly falls which may cause heavy injuries that may lead to a major medical issue for elderly people. Therefore to prevent such emergencies, it will also feature an alarm system to detect human falls. This is made feasible by utilising cutting-edge technology like computer vision and image processing to record live events, RNN with LSTMs to process and analyse the recorded ones, and natural language processing to provide a description of what is happening. Users receive audio clips that are created using the Google Text to Speech API. Keywords: Image processing, Computer Vision, Natural Language Processing, Convolution Neural Networks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.