Abstract
Information Technology domain is facing changes day by day. Furthermore, the size of data increases, as well as the demand to process them. There are two types of data: structured and unstructured data. The multiple sources and the variety of data today involve the use of “Big data” instead of data. It is related that 80% of enteUprise’s data is unstructured [1]. However, the procedures to handle unstructured data are more complex than those for structured data. Thus, it becomes necessary to have a clear idea about this type of data and to know how to extract useful information from this data set. In this paper we will study how to retrieve useful information from unstructured data in E-commerce area using data analysis tools: Spark. To solve this issue, first an overview on structured and unstructured data and data analysis is provided, then information retrieval algorithm will be implemented using Spark MLlib tool in order to determine for a set of reviews, negative or positive, which subjects are more discussed by the customers. This study is needed in order to improve business based on customer satisfaction reviews. In that case, Unsupervised Machine Learning Latent Dirichlet Allocation (LDA) algorithm constitutes our model. Finally, the evaluation of the model will be given based on some parameters.
Highlights
What if the categories are missing? The Latent Dirichlet Allocation (LDA) algorithm solution comes on to solve this issue.The aim of this study is to search for relevant topics that were discussed in a set of Amazon reviews
The proposed solution LDA whitch stands for “Latent Dirichlet Allocation” is based on Drichlet distribution studied by Peter Gustav Lejeune Dirichlet
Unstructured data processing is a crucial methodology being used in current technology due to its simplicity in implementing the solutions and giving high performance
Summary
In December 2019, Amazon visitors were up to 2729 Million [3], so it generates a considerable number of reviews. It becomes necessary to study such data in order to gain knowledge. Reviews data on online platform helps improve Business products. In previous studies, it is used a classification method on reviews dataset to categorize them based on the existing categories. The LDA algorithm solution comes on to solve this issue.The aim of this study is to search for relevant topics that were discussed in a set of Amazon reviews
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Engineering and Advanced Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.