Abstract

The basic experimental data of traditional Chinese medicine are generally obtained by high-performance liquid chromatography and mass spectrometry. The data often show the characteristics of high dimensionality and few samples, and there are many irrelevant features and redundant features in the data, which bring challenges to the in-depth exploration of Chinese medicine material information. A hybrid feature selection method based on iterative approximate Markov blanket (CI_AMB) is proposed in the paper. The method uses the maximum information coefficient to measure the correlation between features and target variables and achieves the purpose of filtering irrelevant features according to the evaluation criteria, firstly. The iterative approximation Markov blanket strategy analyzes the redundancy between features and implements the elimination of redundant features and then selects an effective feature subset finally. Comparative experiments using traditional Chinese medicine material basic experimental data and UCI's multiple public datasets show that the new method has a better advantage to select a small number of highly explanatory features, compared with Lasso, XGBoost, and the classic approximate Markov blanket method.

Highlights

  • At present, due to the rapid development of scientific and technological level, the information acquisition technology and storage capacity have been greatly improved, and the data obtained carry more sufficient information, for which the scale is getting larger and larger

  • Conventional statistical analysis methods, such as multiple linear regression, principal component regression, and ridge regression, choose regression coefficients to reflect the relationship between variables [1,2,3], which cannot effectively delete irrelevant features and redundant features, and achieve the purpose of screening important substances for basic data of traditional Chinese medicines with high dimensionality and a small amount

  • The traditional feature selection methods, such as Lasso and K-split Lasso [4], only can delete irrelevant features and redundant features to some extent and cannot meet the data processing requirements of high-dimensional small samples when dealing with data. erefore, in view of the problem that high-dimensional small sample data of Chinese medicine contain more irrelevant information and redundant information, it is urgent to find an analytical model that can select effective features from high-dimensional small sample data, and improve the Computational and Mathematical Methods in Medicine accuracy and operation of the model to provide technical support for researchers

Read more

Summary

Introduction

Due to the rapid development of scientific and technological level, the information acquisition technology and storage capacity have been greatly improved, and the data obtained carry more sufficient information, for which the scale is getting larger and larger. Conventional statistical analysis methods, such as multiple linear regression, principal component regression, and ridge regression, choose regression coefficients to reflect the relationship between variables [1,2,3], which cannot effectively delete irrelevant features and redundant features, and achieve the purpose of screening important substances for basic data of traditional Chinese medicines with high dimensionality and a small amount. Erefore, in view of the problem that high-dimensional small sample data of Chinese medicine contain more irrelevant information and redundant information, it is urgent to find an analytical model that can select effective features from high-dimensional small sample data, and improve the Computational and Mathematical Methods in Medicine accuracy and operation of the model to provide technical support for researchers.

Related Work
Experimental Design
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.