DATA MINING METHODS FOR OMICS AND KNOWLEDGE OF CRUDE MEDICINAL PLANTS TOWARD BIG DATA BIOLOGY

Farit M Afendi,Naoaki Ono,Yukiko Nakamura,Kensuke Nakamura,Latifah K Darusman,Nelson Kibinge,Aki Hirai Morita,Ken Tanaka,Hisayuki Horai,Md Altaf-Ul-Amin,Shigehiko Kanaya

doi:10.5936/csbj.201301010

Abstract

Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu) as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA) in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.

Highlights

Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data
In order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block
Data-intensive sciences have progressed in modern astronomy [1], the rapid increasing of omics data produced by genomics, biology [2-8], computational materials science [9], ecology [10-11] transcriptomics, proteomics and metabolomics [2-8]

Summary

Introduction

Data-intensive sciences have progressed in modern astronomy [1], the rapid increasing of omics data produced by genomics, biology [2-8], computational materials science [9], ecology [10-11] transcriptomics, proteomics and metabolomics [2-8]. These information need to be connected in a way that enables scientists to make predictions based on general principles In this mini-review, we discuss the usage of KNApSAcK Family DB in metabolomics, explain mining techniques such as principal component analysis (PCA), partial least square regression (PLSR) and multiway model, and show their application on Indonesian blended herbal medicines (Jamu) as a case study. Optimization of blended disease, cancer and respiratory disease, which superseded the herbal formulas should be developing using information derived from infectious diseases because of the development and widespread plant and human omics

Mathematical Methods of Data Mining

Findings

As an illustration for data mining of herbal medicine database

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational and Structural Biotechnology Journal	Publication Date: Jan 1, 2013
Citations: 48	License type: cc-by

R Discovery Prime

R Discovery Prime

DATA MINING METHODS FOR OMICS AND KNOWLEDGE OF CRUDE MEDICINAL PLANTS TOWARD BIG DATA BIOLOGY

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational and Structural Biotechnology Journal

Lead the way for us

Similar Papers

A rapid method for the differentiation of yeast cells grown under carbon and nitrogen-limited conditions by means of partial least squares discriminant analysis employing infrared micro-spectroscopic data of entire yeast cells
Julia Kuligowski ... Bernhard Lendl
Talanta | VOL. 99
Julia Kuligowski, et. al.Julia Kuligowski ... Bernhard Lendl
20 Jun 2012
Talanta | VOL. 99

The Impact of Preprocessing Methods for a Successful Prostate Cell Lines Discrimination Using Partial Least Squares Regression and Discriminant Analysis Based on Fourier Transform Infrared Imaging
Danuta Liberda ... Katarzyna Pogoda
Cells | VOL. 10
Danuta Liberda, et. al.Danuta Liberda ... Katarzyna Pogoda
20 Apr 2021
Cells | VOL. 10

Nearest clusters based partial least squares discriminant analysis for the classification of spectral data
Weiran Song ... Omar Nibouche
Analytica Chimica Acta | VOL. 1009
Weiran Song, et. al.Weiran Song ... Omar Nibouche
06 Feb 2018
Analytica Chimica Acta | VOL. 1009

A multivariate analysis of protein microarrays for signature selection profiles
Saveria Mazzara ... Antonella Sinisi
EMBnet.journal | VOL. 18
Saveria Mazzara, et. al.Saveria Mazzara ... Antonella Sinisi
09 Nov 2012
EMBnet.journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DATA MINING METHODS FOR OMICS AND KNOWLEDGE OF CRUDE MEDICINAL PLANTS TOWARD BIG DATA BIOLOGY

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational and Structural Biotechnology Journal