Abstract

Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n << p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.