Abstract

AbstractDynamic texts in scene videos provide valuable insights and semantic cues crucial for video applications. However, the movement of this text presents unique challenges, such as blur, shifts, and blockages. While efficient in tracking text, state‐of‐the‐art systems often need help when text becomes obscured or complicated scenes. This study introduces a novel method for detecting and tracking video text, specifically designed to predict the location of obscured or occluded text in subsequent frames using a tracking‐by‐detection paradigm. Our approach begins with a primary detector to identify text within individual frames, thus enhancing tracking accuracy. Using the Kalman filter, Munkres algorithm, and deep visual features, we establish connections between text instances across frames. Our technique works on the concept that when text goes missing in a frame due to obstructions, we use its previous speed and location to predict its next position. Experiments conducted on the ICDAR2013 Video and ICDAR2015 Video datasets confirm our method's efficacy, matching or surpassing established methods in performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.