Abstract
Videos are composed of multiple tasks. Dense video captioning entails captioning of different events in the video. A textual description is generated based on visual, speech and audio cues from a video and then topic modeling is performed on the generated caption. Uncertainty modeling technique is applied for finding temporal event proposals where timestamps for each event in the video are produced and also uses Transformer which inputs multi-modal features to identify captions effectively and to make it more precise. Topic modeling tasks include highlighted keywords in the captions generated and topic generation i.e., category under which the whole caption belongs to. The proposed model generates a textual description based on the dynamic and static visual features and audio cues from a video and then topic modeling is performed on the generated caption.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.