Abstract
On a daily basis, large numbers of research articles are published world-wide. Usually the meta data of these articles are made available in bibliographic datasets. The format of such bibliographic dataset is generally in xml format. This format is generally used for data transfer between systems and for data processing by systems. An xml bibliographic dataset will have many article tags and its sub tags specify the meta data associated with each article. Usually an article tag will be associated with many meta data sub tags. Extraction of article title tags is essential for domain based classification of articles. This extraction and subsequent classification of research article titles present in a bibliographic dataset is a laborious task which is usually done manually. Hence a fast and efficient technique is essential to extract titles from datasets and is the need of the hour. In this article, a fast map reduced based approach is proposed to quickly extract research articles titles from bibliographic dataset. Articles from DBLP bibliographic dataset of past 3 years is used in this study. Hadoop Map reduce method is used to speed up the title extraction process from large xml based bibliographic datasets. Performance analysis revealed that the proposed method is quick, efficient and highly scalable.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.