Academics and librarians might want to identify whether a journal is open access (OA) or subscription-based. While indexes and digital libraries might provide such information for known collections, it is possible that the access mode of a journal or body of journals might be unknown a priori. In this short analysis, a machine learning-based method is used to classify a journal’s access mode, OA or subscription, using its CiteScore and Journal Impact Factor (JIF). Using an initial pool of 91 multidisciplinary journals with a CiteScore, 38 journals with both a JIF and a CiteScore were selected (24 = OA; 14 = subscription). Using a data mining tool (Orange), ten machine learning models were applied (k nearest neighbor (kNN), Tree, support vector machine (SVM), Random forest, Neural network, Naïve Bayes, Logistic regression, Adaptive boosting (Adaboost)), Gradient Boosting (Scikit-learn) (GBS) and Gradient Boosting (catboost) (GBC). Adaboost, GBS and GBC showed the highest (100%) precision, sensitivity, and specificity. The 3 models correctly classify the access mode with zero error. The 3 optimum models were validated using then to predict the access mode of 54 (7 = OA; 47 = subscription) library and information science (LIS) journals and Adaboost and GBS gave perfect results with no misclassification. With these model, the access mode of multidisciplinary and LIS journals can be accurately and correctly predicted using only JIF-CiteScore data. Libraries in low-resource settings will benefit from the implementation of this research by designing a decision support system for the selection of journals.
Read full abstract