Abstract

Tree-based transmission-disequilibrium tests are valuable tools to perform fine-mapping in the search of genetic factors for complex diseases, as they use evolutionary information to relate haplotypes affecting the disease. However, the number of different haplotype trees exponentially increases with the number of markers used, leading to spurious associations due to sample overfitting. If the usual Bonferroni correction is applied to avoid those spurious associations, true risk variants may also be missed. In this work we considered a different solution to avoid sample overfitting of haplotype trees. It consists of dividing the data set into at least two parts and using one of them to choose the haplotype tree which models the disease, and the other one to assess the statistical significance. As a practical example to evaluate the performance of our proposal, we modified the TreeDT algorithm and observed a significant improvement in reproducibility while reducing the type I errors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call