Abstract
Classification is an important field of research due to the increase of unstructured text, especially in form of customer inquiries. The problem has two phases such as generally identifying a customer inquiry and automatically assigning an order of a product requested by customer with predefined categories based on its characteristics. These two phases can be accomplished by using various techniques at each interior steps. Choosing the proper technique can affect the efficiency of the text classification performance by saving a time and physical effort. The aim of this paper is to present a classification model that supports efficiency while working with Russian texts, since it is known that machine learning algorithms proved to be working well with English texts. After performing most challenging task which was the preprocessing of unstructured text by stemming, parsing and indexing, the following some logical sequence of steps and analysis with compatible combination of the embedded techniques gives us a chance for comparing algorithms and their behavior on different type of normalized and Unnormalized data. The experimental results over 33000 dataset have been performed using bi-grams, TF-IDF scores along with their parsed frequencies and shows that SVM and Naive Bayesian algorithms outperform others for normalized data. Moreover, optimization using stochastic gradient descent was applied along with neural network, and the results were compared with the traditional machine learning algorithm. The results have proven the capability of the proposed model’s performance can be improved by identifying outliers and their patterns.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.