Currently, the identification of text from video frames and normal scene images has got amplified awareness amongst analysts owing to its diverse challenges and complexities. Owing to a lower resolution, composite backdrop, blurring effect, color, diverse fonts, alternate textual placement among panels of photos and videos, etc., text identification is becoming complicated. This paper suggests a novel method for identifying texts from video with five stages. Initially, “video-to-frame conversion”, is done during pre-processing. Further, text region verification is performed and keyframes are recognized using CNN. Then, improved candidate text block extraction is carried out using MSER. Subsequently, “DCT features, improved distance map features, and constant gradient-based features” are extracted. These characteristics subsequently provided “Long Short-Term Memory (LSTM)” for detection. Finally, OCR is done to recognize the texts in the image. Particularly, the Self-Improved Bald Eagle Search (SI-BESO) algorithm is used to adjust the LSTM weights. Finally, the superiority of the SI-BESO-based technique over many other techniques is demonstrated.