Abstract

With the prompt improvement in research progress of various zones, selection of research proposals became a remarkable methodology in many research funding agencies and organizations. When a less number of research proposals are received, then it is ease to cluster the research proposals and the selection process became as non-problematic way. If a number of research proposals elevated, then the clustering and selecting the proposals became complicated. In current system, proposals grouping is done in manual-based or along with their similarities in subject disciplinaries which yield irrelevant results in some cases. The main goal of this research work is to develop an enhanced system in selection of research proposals based on Domain ontology, where the ontology acts as a searching criteria for the topics of research proposals. This proposed system will help to select the topics of research proposals in well-systematic way without the interference of manual progression. In this paper, an algorithm is proposed as Scikit-learn K-means Multiclass Document Clustering(SKMDC) to group each subject discipline according to their sub-topics and sub-domains. Here, the k-means clustering technique is implemented on categorical data to implement the clustering process. As, the categorical data are not able to applied directly in K-means clustering algorithm, the LabelEncoder method is implemented to encode the text data to numerical values and the dimensions of a dataset are reduced using Principal Component Analysis. This paper also overwhelms the weaknesses of k-means technique in specification of cluster number in initial stage. It is done through the determination of optimal number of clusters by using Elbow Curve method and it is cross-validated through Silhouette Score analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call