Abstract

The gene coexpression study has emerged as a novel holistic approach for microarray data analysis. Different indices have been used in exploring coexpression relationship, but each is associated with certain pitfalls. The Pearson's correlation coefficient, for example, is not capable of uncovering nonlinear pattern and directionality of coexpression. Mutual information can detect nonlinearity but fails to show directionality. The coefficient of determination (CoD) is unique in exploring different patterns of gene coexpression, but so far only applied to discrete data and the conversion of continuous microarray data to the discrete format could lead to information loss. Here, we proposed an effective algorithm, CoexPro, for gene coexpression analysis. The new algorithm is based on B-spline approximation of coexpression between a pair of genes, followed by CoD estimation. The algorithm was justified by simulation studies and by functional semantic similarity analysis. The proposed algorithm is capable of uncovering both linear and a specific class of nonlinear relationships from continuous microarray data. It can also provide suggestions for possible directionality of coexpression to the researchers. The new algorithm presents a novel model for gene coexpression and will be a valuable tool for a variety of gene expression and network studies. The application of the algorithm was demonstrated by an analysis on ligand-receptor coexpression in cancerous and noncancerous cells. The software implementing the algorithm is available upon request to the authors.

Highlights

  • The utilization of high-throughput data generated by microarray gives rise to a picture of transcriptome, the complete set of genes being expressed in a given cell or organism under a particular set of conditions

  • We proposed a new algorithm, CoexPro, which is based on B-spline approximation followed by coefficient of determination (CoD) estimation, for gene coexpression analysis

  • Z-Score was calculated as Z = (CoD − CoD)/σ, where CoD was estimated from the original dataset and σ was the standard deviation

Read more

Summary

Introduction

The utilization of high-throughput data generated by microarray gives rise to a picture of transcriptome, the complete set of genes being expressed in a given cell or organism under a particular set of conditions. With recent interests in biological networks, the gene coexpression study has emerged as a novel holistic approach for microarray data analysis [1,2,3,4]. Since the correlation coefficient is a symmetrical measurement, it cannot provide evidence of directional relationship in which one gene is upstream of another [7]. The coefficient of determination (CoD), on the other hand, is capable of uncovering nonlinear relationship in microarray data and suggesting the directionality, has been used in prediction analysis of gene expression, determination of connectivity in regulatory pathways, and network inference [10,11,12,13,14]. Quantization is a coarse-grained approximation of gene expression pattern and the resulting data may represent “qualitative” relationship and lead to biologically erroneous conclusions [15]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.