Abstract

In the area of biology, text mining is commonly used since it obtains the unknown relationship among medicines, phenotypes and syndromes from much information. Enhanced Topic modeling with Improved Predict drug Indications and Side effects using Topic modelling and Natural language processing (ETP-IPISTON) has been employed to predict the drug-phenotype and drug-side effect association. Initially, corpus documents are collected from the literature data and the topics in the data are modeled using logistic Linear Discriminative Analysis (LDA) and Bi-directional Long-Short Term Memory-Conditional Random Field (BILSTM-CRF). From the sentences in the literature data, a dependency graph was constructed which discovered the relations between gene and drug. The product of the drug on phenotype rule was identified by the Gene Regulation Score (GRS) which creates the drug-topic probability matrix. The probability matrix and a syntactic distance measure was processed in Classification and Regression Tree (CART), Naïve Bayes (NB), logistic regression and Convolutional Neural Network (CNN) classifiers for estimating the drug-gene and drug-side effects. Besides the literature data, social media offers various promising resources with massive volume of data that can be useful in the drug-phenotype and drug-side effect association prediction. So in this paper, drug information with gene, disease and side effects are extracted from different social media such as Twitter, Facebook and LinkedIn and it can be used with the literature data to provide more relevant disease and drug relations. In addition to this, topic modeling with transfer learning is introduced to consider the element categories, probability of overlapping elements and deep contextual significance of a text for better modeling of topics. The topic modeling with transfer learning shares as much knowledge as possible between the literature data and social media information for topic modeling. The topics from social media and literature data are used for creating the drug-topic matrix. The probability matrix and syntactic distance measure are given as input to CART, NB, logistic regression and CNN for estimating the drug-gene and drug-side effect association. This proposed work is named as Enhanced Topic Modeling with Transfer Leaning- IPISTON (ETPTL-IPISTON). The simulation findings exhibit that the efficiency of ETPTL-IPISTON than the traditional methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call