Abstract

The Personal Genome Project (PGP) is an effort to enroll many participants to create an open-access repository of genome, health and trait data for research. However, PGP participants are not enrolled for studying any specific traits and participants choose the phenotypes to disclose. To measure the extent and willingness and to encourage and guide participants to contribute phenotypes, we developed an algorithm to score and rank the phenotypes and participants of the PGP. The scoring algorithm calculates the participation index (P-index) for every participant, where 0 indicates no reported phenotypes and 100 indicate complete phenotype reporting. We calculated the P-index for all 5,015 participants in the PGP and they ranged from 0 to 96.7. We found that participants mainly have either high scores (P-index > 90, 29.5%) or low scores (P-index < 10, 57.8%). While, there are significantly more males than female participants (1,793 versus 1,271), females tend to have on average higher P-indexes (P = 0.015). We also reported the P-indexes of participants based on demographics and states like Missouri and Massachusetts have better P-indexes than states like Utah and Minnesota. The P-index can therefore be used as an unbiased way to measure and rank participant’s phenotypic contribution towards the PGP.

Highlights

  • We explored the landscape of phenotypes available in the Personal Genome Project (PGP) and how extensive they are using a scoring algorithm, which is unbiased towards any particular phenotype

  • Phenotypes were considered valid if 2 or more participants reported valid values for that phenotype, if the phenotype does not pertain to genotyping information and if they meet our other filtering criteria

  • We described a method for ranking phenotypes and participants in databases used for research and applied the method to the Harvard Personal Genome Project (PGP)

Read more

Summary

Introduction

The data is made open-access, allowing anyone to use the genotype and phenotype data for research, accelerating the process of using data from large cohorts of individuals for research[16] These participants have consented the public sharing of their genotype and phenotype data for research purposes, and can be re-contacted for additional follow-up study. In determining the P-index, the algorithm allocates more weight to phenotypes that are provided by many participants and gives less weight to phenotypes that are provided by only fewer participants This is because having more participants with a specific phenotype increases the statistical power for discovering a meaningful genetic association[17,18,19]. We partitioned the participants demographically and reported their P-indexes based on states and zip codes Using this scoring algorithm, we investigated the landscape of phenotype data available in the PGP, as well as the willingness of participants in providing phenotype data. Our algorithm can be used to incentivize and guide participants (See Discussion) in sharing more phenotype data and can be applied to other projects structured like the PGP in reaching out to participants for sharing phenotypes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.