Abstract

<p>The emergent concept of ‘ Big Data’ has shifted the paradigm from information retrieval to information extraction techniques. The information extraction techniques enables corpus analysis to draw useful interpretations and its possible applications. Selection of appropriate information extraction technique depends upon the type of data being dealt with and its possible applications. In an R&D environment, the published information is considered as an authenticated benchmark to study and analyse the growth pattern in that field of science, medicine, business. A rule based information extraction process, on the selected data extracted from a bibliographic database of published R&D papers is proposed in this paper. Aim of the study is to build up a database on relevant concepts, cleaning of retrieved data and automate the process of information retrieval in the local database. For this purpose, a concept based ‘subject profiles’ in the area of advanced semiconductors as well as the rules for text extraction from metadata retrieved from the bibliographic database was developed. This subset was used as an input to the knowledge domain to support R&D in the area of ‘advanced semiconductor materials and devices’ and provide information services on Intranet. Study found that concept based pattern matching on the datasets downloaded yielded better results as compared to the results by using the controlled vocabulary of the source database .</p>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.