Abstract
The Quantitative Trait Loci (QTL) mapping problem aims to identify regions in the genome that are linked to phenotypic features of the developed organism that vary in degree. It is a principle step in determining targets for further genetic analysis and is key in decoding the role of specific genes that control quantitative traits within species. Applications include identifying genetic causes of disease, optimization of cross-breeding for desired traits and understanding trait diversity in populations. In this paper a new multi-objective evolutionary algorithm (MOEA) method is introduced and is shown to increase the accuracy of QTL mapping identification for both independent and epistatic loci interactions. The MOEA method optimizes over the space of possible partial least squares (PLS) regression QTL models and considers the conflicting objectives of model simplicity versus model accuracy. By optimizing for minimal model complexity, MOEA has the advantage of solving the over-fitting problem of conventional PLS models. The effectiveness of the method is confirmed by comparing the new method with Bayesian Interval Mapping approaches over a series of test cases where the optimal solutions are known. This approach can be applied to many problems that arise in analysis of genomic data sets where the number of features far exceeds the number of observations and where features can be highly correlated.
Highlights
Advances in biological technology are generating an exponential growth in the amount of genomic data available for analysis
Quantitative traits in organisms are formed during development by the interaction of many genes located throughout the genome and distributed over multiple chromosomes
In each case the results obtained were compared with results obtained by the Bayesian Interval Mapping method using Windows Quantitative Trait Loci (QTL) Cartographer (WQTLC) Ver. 2.5 [36], a popular tool in the bioinformatics field of study
Summary
Advances in biological technology are generating an exponential growth in the amount of genomic data available for analysis Processing this data requires pattern classification and feature selection methods that can identify those features that are significant, in addition to eliminating redundant and irrelevant features. Random Amplification of Polymorphic DNA (RAPD) was slow and cumbersome since it requires a large amount of sample DNA along with steps to produce the gene loci present. New techniques such as amplified fragment length polymorphism or AFLP utilize high-throughput. Many authors have applied these techniques in their research to study economically significant organisms like: maize (see [9,10,11]), tomatoes (see [12,13]), and rice (see [14])
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have