Abstract

Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Though text recognition from images have been an active research area in computer vision, text in lecture videos has mostly been overlooked. In this paper, text extraction from lecture videos are focused. For text extraction from different types of lecture videos such as slides, whiteboard lecture videos, paper lecture videos, etc. The text extraction, the text regions are segmented in video frames and extracted using recurrent neural network based OCR. And finally, the extracted text is converted into audio for ease of convenience. The designed algorithm is tested on different videos from different lectures. The experimental results show that the proposed methodology is quite efficient over existing work.

Highlights

  • Visual text is one in all the foremost necessary strategies of communication utilized by human beings and is wide utilized in our everyday life

  • We focus our research on such lecture recordings having been produced by state-of-the-art lecture recording systems. With this kind of a system, we are able to record the lecture video such that we combine two video streams: the main scene of lecturers which is recorded by using a video camera, and the second which captures the images projected onto the screen during the lecture through a frame grabber

  • CNN filter is used to segment text region

Read more

Summary

INTRODUCTION

Visual text is one in all the foremost necessary strategies of communication utilized by human beings and is wide utilized in our everyday life Interpreting this textual data is of great significance. Text detection and recognition in unconstrained environments is a challenging computer vision problem Such functionality can play valuable role in numerous real-world applications, ranging from video indexing, assistive technology for the visually impaired, automatic localization for businesses, and robotic navigation. We focus our research on such lecture recordings having been produced by state-of-the-art lecture recording systems With this kind of a system, we are able to record the lecture video such that we combine two video streams: the main scene of lecturers which is recorded by using a video camera, and the second which captures the images projected onto the screen during the lecture through a frame grabber. In video OCR, the text within video frame has to be automated localized and separated from its background and the image quality enhancement have to be applied before deepOCR procedures can process the text successfully

RELATED WORK
PROPOSED METHODOLOGY
RESULT
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call