Tagging Assistant for Scientific Articles

Zara Nasar,Syed Waqar Jaffry,Muhammad Kamran Malik

doi:10.1007/978-981-13-6052-7_30

Abstract

With the advent of World Wide Web (WWW), world is being overloaded with huge data. This huge data carries potential information that once extracted, can be used for betterment of humanity. Information from this data can be extracted using manual and automatic analysis. Manual analysis is not scalable and efficient, whereas, the automatic analysis involves computing mechanisms that aid in automatic information extraction over huge amount of data. WWW has also affected overall growth in scientific literature that makes the process of literature review quite laborious, time consuming and cumbersome job for researchers. Hence a dire need is felt to automatically extract potential information out of immense set of scientific articles in order to automate the process of literature review. Such service would require machine learning models to train. Whereas, such model in turn require training dataset. To construct a quality dataset often involves employment of annotation tools. There exist wide variety of annotation tools, but none are tailored to assist annotation of scientific articles. Hence in this study, web-based annotation tool for scientific articles is developed using Python language. The developed assistant employs state of the art machine learning models to extract metadata from scientific articles as well as to process article’s text. It provides various filters in order to assist annotators. An article is divided into various textual constructs including sections, paragraphs, sentences, tokens and lemmas. This division can help annotators by addressing their information need in an efficient manner. Hence, this annotation tool can significantly reduce time while preparing dataset for full-text scientific articles.

Full Text