Discriminant Analysis and Normalization Methods for Next-Generation Sequencing Data

Yichuan Zhao,Yan Zhou,Tiejun Tong,Junhui Wang

doi:10.1007/978-3-319-99389-8_18

Abstract

Next-generation sequencing has become a powerful tool for gene expression analysis with the development of high-throughput techniques. Discriminating which type of diseases a new sample belongs to is a fundamental issue in medical and biological studies. Different from continuous microarray data, next-generation sequencing reads are mapped onto the reference genome and are discrete data. Consequently, existing discriminant analysis methods for microarray data may not be readily applicable for next-generation sequencing data. In recent years, a number of new discriminant analysis methods have been proposed to discriminate next-generation sequencing data. In this chapter, we introduce three such methods including the Poisson linear discriminant analysis, the zero-inflated Poisson logistic discriminant analysis, and the negative binomial linear discriminant analysis. In view of the importance, we further introduce several normalization methods for processing next-generation sequencing data. Simulation studies and two real datasets are also carried out to demonstrate the usefulness of the newly developed methods.

Full Text