Abstract

The comments contained on e-commerce users generally contain opinions about positive or negative experiences at several online shops. Sentences that can be written indirectly both a little or a lot, will affect other potential customers. So as a result of these comments cause a product sold at an online store has a rating of two things namely "recommended" or "non-recommended". However, detection of positive and negative opinions manually will require more time because of the large amount of data. For this reason opinion mining using technology in data mining can be used to automate positive and negative detection of comments. However, one of the main problems in opinion mining is limited data but has a large number of attributes. In this study, we propose the application of Pearson correlation (PC) based feature selection for opinion mining optimization. The results of the experiment show that the application of PC increases the performance of opinion mining systems in 3 types of classification, namely Logistic Regression, Naïve Bayes and Support Vector Machine, resulting in more optimal accuracy, namely 98.80%, 87.87% and 98.12%.

Highlights

  • The comments contained on e-commerce users generally contain opinions about positive or negative experiences at several online shops

  • In this study, we propose the application of Pearson correlation (PC) based feature selection for opinion mining optimization

  • Metode Naïve Bayes dengan Ensemble Feature dan Seleksi Fitur http//poseidon.csd.auth.gr%0Ahttp://clopinet.com/isabelle/Projec

Read more

Summary

Preprocessing Data

Tidak banyak algoritma yang dikhususkan untuk stemming bahasa Indonesia dengan. Algoritma Porter merupakan hal yang penting untuk tahap selanjutnya, misalnya, algoritma ini membutuhkan waktu yang yaitu mengurangi atribut yang kurang berpengaruh relatif lebih singkat dibandingkan dengan stemming terhadap proses klasifikasi data yang dimasukan pada. Menggunakan algoritma Nazief dan Adriani, namun menyatakan jumlah atribut independent, sedangkan proses stemming menggunakan algoritma Porter untuk simbol j menyatakan jumlah record dalam memiliki persentase keakuratan lebih kecil dataset. Dibandingkan dengan stemming menggunakan algoritma Nazief dan Adriani. Algoritma Nazief dan Adriani sebagai algoritma stemming untuk teks berbahasa Indonesia yang memiliki kemampuan persentase keakuratan lebih baik dari algoritma lainnya. Metode Naive Bayes merupakan salah satu metode machine learning yang menggunakan perhitungan probabilitas. Konsep dasar yang digunakan oleh Bayes adalah Teorema Bayes, yaitu melakukan klasifikasi dengan melakukan perhitungan nilai probabilitas

Seleksi Fitur dengan Pearson Correlation
Modelling kemampuan sistem untuk menemukan peringkat
Pengambilan Data

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.