Any work's citations are regarded as a key characteristic that leads to its appraisal and study. Citations are one of the most important indicators of a research publication's quality. Citations can have a favorable or bad impact on any piece of work or publication depending on a variety of circumstances, including author skill, publication venue, research topic, and so on. The goal of this study is to see how the number of co-authors affects the number of citations in research papers. There will be a correlation analysis between the number of co-authors and the number of citations for research articles, and we will observe how the number of co-authors affects the number of citations for publications. Citation data is gathered from databases such as DBLP, ACM, MAG (Microsoft Academic Graph), and others. There are 629,814 papers and 632,752 citations in the initial version. We use two methods to examine the impact of co-author count on the number of citations in a research paper: (i) Pearson’s correlation coefficient (PCC), and (ii) multiple regression (MR). To test the impact of co-author count on citation count of research publications, we calculate Pearson’s correlation coefficient (ra) between the two variables number of authors (NA) and citation count (CC). We also calculate Pearson’s correlation coefficient between the citation count (CC) and the most effective variables to compare between the impact of the number of authors and the impact of the other factors such as (i) rc between number of countries (NC) and citation count (CC). (ii) rv between venue category (VC) and citation count (CC). (iii) ry between Year_From (YF) and citation count (CC). Empirical evidence shows that co-authored publications achieve higher visibility and impact. To predict the number of citations from the previously mentioned factors (NA, NC, VC, and YF), we use multiple linear regression (MLR). The goal of multiple linear regression (MLR) is to model the linear relationship between the explanatory (independent) variables and response (dependent) variables. The higher R-square, the tight relationship exists between dependent variables and independent variables. It is observed that the R-square decreases in the case of removing NA which means that the NA is the most influential factor (the relation between NA and CC is the most powerful relation). The main originality of this paper is to introduce an effective prediction module (EPM) which uses probabilistic neural network (PNN) to predict the number of citations from the most effective factors (NA, NC, VC, and YF).
Read full abstract