Abstract
Most approaches to gene expression analysis use real-valued expression data, produced by high-throughput screening technologies, such as microarrays. Often, some measure of similarity must be computed in order to extract meaningful information from the observed data. The choice of this similarity measure frequently has a profound effect on the results of the analysis, yet no standards exist to guide the researcher. To address this issue, we propose to analyse gene expression data entirely in the binary domain. The natural measure of similarity becomes the Hamming distance and reflects the notion of similarity used by biologists. We also develop a novel data-dependent optimization-based method, based on Genetic Algorithms (GAs), for normalizing gene expression data. This is a necessary step before quantizing gene expression data into the binary domain and generally, for comparing data between different arrays. We then present an algorithm for binarizing gene expression data and illustrate the use of the above methods on two different sets of data. Using Multidimensional Scaling, we show that a reasonable degree of separation between different tumor types in each data set can be achieved by working solely in the binary domain. The binary approach offers several advantages, such as noise resilience and computational efficiency, making it a viable approach to extracting meaningful biological information from gene expression data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.