Text Extraction Techniques Research Articles

Military intelligence analysts use automated tools to exploit physics-based sensor data to construct a spatio-temporal picture of adversary entities, networks, and behaviors on the battlefield. Traditionally, tools did not exploit human generated, textual reports, leaving analysts to manually map dots on the map into meaningful entities using background knowledge about adversary equipment, organization, and activity. Current off-the-shelf text extraction techniques underperform on tactical reports due to unique characteristics of the text. Tactical reports typically feature short sentences with simple grammar, but also tend to include jargon and abbreviations, do not follow grammatical rules, and are likely to have spelling errors. Likewise, named entity recognizers have low recall, because few of the names in reports appear in standard dictionaries. We have developed an entity extraction capability tailored to these challenges, and to the specific needs of analysts, as part of a comprehensive exploitation and fusion system. With fewer cues from syntax, our approach uses semantic constraints to disambiguate syntactic patterns, implemented by a hybrid system that post-processes the output from a standard Natural Language Processing (NLP) engine with our custom semantic pattern analysis. Additional functionality extracts military time and location formats – essential elements that enable downstream fusion of extracted entities with sensor information resulting in a compact and meaningful representation of the battlefield situation.

The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.

Text Extraction Techniques Research Articles

Related Topics

Articles published on Text Extraction Techniques

Scene text extraction based on edges and support vector regression

Extracting Meaningful Entities from Human-generated Tactical Reports

Design and FPGA Implementation of DWT, Image Text Extraction Technique

Robust Extraction of Text from Camera Images using Colour and Spatial Information Simultaneously

Morphological text extraction from images

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Text Extraction Techniques Research Articles

Related Topics

Articles published on Text Extraction Techniques

Scene text extraction based on edges and support vector regression

Extracting Meaningful Entities from Human-generated Tactical Reports

Design and FPGA Implementation of DWT, Image Text Extraction Technique

Robust Extraction of Text from Camera Images using Colour and Spatial Information Simultaneously

Morphological text extraction from images