Abstract

Data analytics has an interesting variant that aims to understand an entity's behavior. It is termed as diagnostic analytics, which answers “why type questions”. “Why type questions” find their applications in emotion classification, brand analysis, drug review modeling, customer complaints classification etc. Labeled data form the core of any analytics' problem, leave alone diagnostic analytics; however, labeled data is not always available. In some cases, it is required to assign labels to unknown entities and understand its behavior. For such scenarios, the proposed model unites topic modeling and text classification techniques. This combined data model will help to solve diagnostic issues and obtain meaningful insights from data by treating the procedure as a classification problem. The proposed model uses Improved Latent Drichlet Allocation for topic modeling and sentiment analysis to understand an entity's behavior and represent it as an Improved Multinomial Naïve Bayesian data model to achieve automated classification. The model is tested using drug review dataset obtained from UCI repository. The health conditions with their associated drug names were extracted from the reviews and sentiment scores were assigned. The sentiment scores reflected the behavior of various drugs for a particular health condition and classified them according to their quality. The proposed model performance is compared with existing baseline models and it is proved that our model exhibited better than other models.

Highlights

  • Data analytics is a branch of data mining, that deals with extracting useful information from the data

  • The sentiment scores reflected the behavior of various drugs for a particular health condition and classified them according to their quality

  • The repplacement of high frequency words with their base words, sparse features minimization using medical thesaurus, polysemy representation and semantic modelling of knowledge base of classifier have shown remarkable perfromance compared to other baseline models

Read more

Summary

Introduction

Data analytics is a branch of data mining, that deals with extracting useful information from the data. There are three reasons for that, availability of labeled data regarding the entity of interest, extracting topics or coherent terms from documents and size of varying topics (saliency). The context-aware systems are more concerned about, topic relevance, term relevance, and topic labels when compared to bag-of words approach. Such a system capable of revealing systematic use cases of the Corresponding Author business problems through conceptual models help to arrive at solutions [14]. It is enhanced to handle sparse features, extracts latent semantic relationships from the data, keywords for topics of all sizes and minimize polysemy issues

Objectives
Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.