With the formation of the fourth paradigm of scientific research, algorithms have become increasingly important in scientific research. In academic papers, algorithms may be mentioned by scholars with various motivations, using, comparing, or improving algorithms to solve complex research tasks. Identifying these motivations can help scholars discover the relationships between algorithms and further assess their roles and values. Therefore, taking the field of natural language processing (NLP) as an example, this article proposes a complete method to conduct the identification, distribution, and evolution of motivations for mentioning algorithms at the sentence level. Specifically, using manual annotation and machine learning methods, we identify algorithm entities and sentences in the full text of papers, classify motivations for mentioning algorithms by pre-training models and data augmentation techniques, and finally analyze the distribution and evolution of motivations. The results show that the deep learning models trained with the augmented data outperform the traditional machine learning models in the classification task. In academic papers, more than half of the sentences show the direct use of algorithms, while the lowest percentage of motivations are improving algorithms, and the diversity of motivations has been increasing with time. For specific algorithms, grammatical algorithms are mentioned more by the motivation of “description,” while more motivations of “use” are found in the machine learning algorithms category. As time passed, the “use” motivations gradually replaced the “description” motivations for different algorithms, and the number of motivation types decreased significantly. Our research explores the identification, distribution, and evolution of authors’ motivations for mentioning algorithm entities, which could provide a basis for future algorithm relationship identification and influence evaluation using motivations.
Read full abstract