Abstract

Excellent research papers are vital to science and technology advances. Thus, the early identification of potentially excellent research papers and recognizing their value in science and technology is high on the research agenda. This study used a set of 5 static and 8 time-dependent citation features to explore six machine learning methods and identify the method with the best performance to identify potentially excellent papers. The study modelled Random Forest, LightGBM, Naive Bayes, Support Vector Machine, Neural Network, and TabNet to identify PEPs in the artificial intelligence field. The study defined highly cited papers using the threshold of the top 1% and top 5% and collected the data from the Web of Science®. Bibliometric and citation data from 485,041 research articles, proceeding papers, and reviews published in AI between 1990 and 2010 were collected initially. The data was screened and processed, and the final dataset consists of 96,169 papers for the training and test sets. The findings suggest that the time-dependent citation features are more important than the static features, and citation peak features are more significant than the citation features in identifying potentially excellent papers. The findings demonstrate the effect of threshold on machine learning outcomes (e.g., the top 1% and 5%); therefore, the study argues that the decision about threshold selection should be carefully made. LightGBM and Random Forest both performed with the given conditions and achieved the same score in accuracy and recall. Nevertheless, when comparing their performance in other indicators, such as F1 and cross-entropy loss, LightGBM performed better. The study concluded that LightGBM was the best-performing model for identifying potentially excellent papers. The papers identified the contributions and recommended future research.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.