Fast Extraction of Article Titles from XML Based Large Bibliographic Datasets

K.P Swaraj,D Manjula

doi:10.1016/j.protcy.2016.05.108

Abstract

On a daily basis, large numbers of research articles are published world-wide. Usually the meta data of these articles are made available in bibliographic datasets. The format of such bibliographic dataset is generally in xml format. This format is generally used for data transfer between systems and for data processing by systems. An xml bibliographic dataset will have many article tags and its sub tags specify the meta data associated with each article. Usually an article tag will be associated with many meta data sub tags. Extraction of article title tags is essential for domain based classification of articles. This extraction and subsequent classification of research article titles present in a bibliographic dataset is a laborious task which is usually done manually. Hence a fast and efficient technique is essential to extract titles from datasets and is the need of the hour. In this article, a fast map reduced based approach is proposed to quickly extract research articles titles from bibliographic dataset. Articles from DBLP bibliographic dataset of past 3 years is used in this study. Hadoop Map reduce method is used to speed up the title extraction process from large xml based bibliographic datasets. Performance analysis revealed that the proposed method is quick, efficient and highly scalable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Technology	Publication Date: Jan 1, 2016
Citations: 2	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Fast Extraction of Article Titles from XML Based Large Bibliographic Datasets

Abstract

Talk to us

Similar Papers

More From: Procedia Technology

Lead the way for us

Similar Papers

Efficient entity resolution using multiple blocking keys for bibliographic dataset
Dolly Mittal ... Madhuri Gupta
-
Dolly Mittal, et. al.Dolly Mittal ... Madhuri Gupta
01 Dec 2017
01 Dec 2017

‘자폐’ 관련 전국단위 주요 신문사와 방송사의 보도형태 변화 분석
Choong-Hoon Kwon
The K Association of Education Research | VOL. 8
Choong-Hoon KwonChoong-Hoon Kwon
30 Jun 2023
The K Association of Education Research | VOL. 8

The trend and evolution of training in volleyball: A bibliometric analysis of the last five years
Saharullah ... Iwan Hermawan
Journal Sport Area | VOL. 9
Saharullah, et. al. Saharullah ... Iwan Hermawan
28 Dec 2023
Journal Sport Area | VOL. 9

Assessing research impact with Google Scholar: The most cited articles in the journal 2008–2010
Hans Thulesius
Scandinavian Journal of Primary Health Care | VOL. 29
Hans ThulesiusHans Thulesius
29 Nov 2011
Assessing research impact with Google Scholar: The most cited articles in the journal 2008–2010
Hans Thulesius

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Extraction of Article Titles from XML Based Large Bibliographic Datasets

Abstract

Talk to us

Similar Papers

More From: Procedia Technology