Identification of Duplication in Questions Posed on Knowledge Sharing Platform Quora using Machine Learning Techniques

R Rishickesh*,A Shahina,R.P Ram Kumar,A Nayeemulla Khan

doi:10.35940/ijitee.l3017.1081219

Abstract

Quora, an online question-answering platform has a lot of duplicate questions i.e. questions that convey the same meaning. Since it is open to all users, anyone can pose a question any number of times this increases the count of duplicate questions. This paper uses a dataset comprising of question pairs (taken from the Quora website) in different columns with an indication of whether the pair of questions are duplicates or not. Traditional comparison methods like Sequence matcher perform a letter by letter comparison without understanding the contextual information, hence they give lower accuracy. Machine learning methods predict the similarity using features extracted from the context. Both the traditional methods as well as the machine learning methods were compared in this study. The features for the machine learning methods are extracted using the Bag of Words models- Count-Vectorizer and TFIDF-Vectorizer. Among the traditional comparison methods, Sequence matcher gave the highest accuracy of 65.29%. Among the machine learning methods XGBoost gave the highest accuracy, 80.89% when Count-Vectorizer is used and 80.12% when TFIDF-Vectorizer is used.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identification of Duplication in Questions Posed on Knowledge Sharing Platform Quora using Machine Learning Techniques

Abstract

Talk to us

Similar Papers

More From: International Journal of Innovative Technology and Exploring Engineering

Lead the way for us

Journal: International Journal of Innovative Technology and Exploring Engineering	Publication Date: Oct 30, 2019
Citations: 2

Similar Papers

Predicting Outpatient Appointment Demand Using Machine Learning and Traditional Methods.
Brian Klute ... Wei Chen
Journal of medical systems | VOL. 43
Brian Klute, et. al.Brian Klute ... Wei Chen
19 Jul 2019
Journal of medical systems | VOL. 43

A Comparative Analysis of Traditional Forecasting Methods and Machine Learning Techniques for Sales Prediction in E-commerce
Irina V Pustokhina ... Denis A Pustokhin
American Journal of Business and Operations Research | VOL. 10
Irina V Pustokhina, et. al.Irina V Pustokhina ... Denis A Pustokhin
01 Jan 2023
American Journal of Business and Operations Research | VOL. 10

Developing Theory Using Machine Learning Methods
Prithwiraj Choudhury ... Michael Endres
SSRN Electronic Journal | VOL. -
Prithwiraj Choudhury, et. al.Prithwiraj Choudhury ... Michael Endres
01 Jan 2018
SSRN Electronic Journal | VOL. -

A systematic machine learning method for reservoir identification and production prediction
Liuyang Xu ... Wei Liu
Petroleum Science | VOL. 20
Liuyang Xu, et. al.Liuyang Xu ... Wei Liu
01 Feb 2023
Petroleum Science | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of Duplication in Questions Posed on Knowledge Sharing Platform Quora using Machine Learning Techniques

Abstract

Talk to us

Similar Papers

More From: International Journal of Innovative Technology and Exploring Engineering