Multilevel Graph-Based Decision Making in Big Scholarly Data: An Approach to Identify Expert Reviewer, Finding Quality Impact Factor, Ranking Journals and Researchers

Muhammad Mazhar Ullah Rathore,Anand Paul,Ashraf Ali Khan,Spiridon Bakiras,Malik Junaid Jami Gul,Raja Wasim Ahmad,Joel J P C Rodrigues

doi:10.1109/tetc.2018.2869458

Abstract

Digital libraries, such as conference papers, journal documents, books and thesis, research patents, and experiments generate a vast amount of data, named as, Scholarly Big Data. It covers scholarly related information for both researcher's perspective as well as publisher's perspective, such as academic activities, author's demography, academic social networks, etc. The relationships among Big Scholarly Data can be worthy of solving researcher as well as journal related concerns, if they are prudently treated to extract knowledge. The best approach to efficiently process these relationships is the graph. However, with the rapid growth in the number of digital articles by various libraries, the relationships raise exponentially, generating large graphs, which have become increasingly challenging to be handled in order to analyze scholarly information. On the other hand, many researchers and publishers/journals have severe concerns about the ranking control mechanisms and the consideration of quantity rather than quality. Therefore, in this paper, we proposed graph-based mechanisms to perform four critical decisions that are the need of the today's scholarly community. To improve the quality of the article, we proposed a mechanism for selecting and recommending suitable reviewers for a submitted paper based on researchers' expertise and their popularity in that particular field while avoiding conflict of interest. Also, due to shortcomings in the existing journal ranking approaches, we also designed a journal ranking mechanism including its new impact factor and relative ranking by using a modified version of traditional page ranking algorithm and excluding self-authors citations as well as self-journal citations. Similarly, researchers ranking is also important for various motives that is calculated based on the expert's field, citation count, and a number of publications while avoiding any loophole to increase the ranking such as, self-citations and wrong citations. Also, to efficiently process big graphs generated by a massive number of scholarly related relationships, we proposed an architecture that uses the parallel processing mechanism of the Hadoop ecosystem over the real-time analysis approach of Apache Spark with GraphX. Finally, the efficiency of the proposed system is evaluated in terms of processing time and throughput while implementing the designed decision mechanisms.

Full Text