Abstract

Internet and mobile computing have become a major societal force in that down-to-earth problems and issues are being addressed and sorted out. For this to be effective, information and content extraction need to be at a basic generic level to address different sources and types of web documents and preferably not dependent on any major software. The present study is a development in this direction and focuses on extraction of information from the available text and media-type data as it is stored in the computer, in digital form. The approach is based on operating generic pixel-maps-as stored for any data-so that issues of language, text-script and format do not pose problems. With the pixel-maps, as bases, different methods are used to convert into a numerical form for suitable neural modeling and content is extracted with ease so that approach is universal. Statistical features of the pixel-maps are extracted from the pixel map matrix of the image. The extracted features are presented to neural model and standard Back Propagation algorithm with hidden layers is used to extract content. The accuracy is compared to give the validity of the approach as to how the content extraction within certain bounds could be possible for any web page.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.