Random Forest-Based Genome Analysis for Disease Association and SNP Marker Identification
Random Forest-Based Genome Analysis for Disease Association and SNP Marker Identification
- Research Article
22
- 10.3389/fgene.2018.00238
- Jul 10, 2018
- Frontiers in Genetics
The current paradigm of genomic studies of complex diseases is association and correlation analysis. Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the identified genetic variants by GWAS can only explain a small proportion of the heritability of complex diseases. A large fraction of genetic variants is still hidden. Association analysis has limited power to unravel mechanisms of complex diseases. It is time to shift the paradigm of genomic analysis from association analysis to causal inference. Causal inference is an essential component for the discovery of mechanism of diseases. This paper will review the major platforms of the genomic analysis in the past and discuss the perspectives of causal inference as a general framework of genomic analysis. In genomic data analysis, we usually consider four types of associations: association of discrete variables (DNA variation) with continuous variables (phenotypes and gene expressions), association of continuous variables (expressions, methylations, and imaging signals) with continuous variables (gene expressions, imaging signals, phenotypes, and physiological traits), association of discrete variables (DNA variation) with binary trait (disease status) and association of continuous variables (gene expressions, methylations, phenotypes, and imaging signals) with binary trait (disease status). In this paper, we will review algorithmic information theory as a general framework for causal discovery and the recent development of statistical methods for causal inference on discrete data, and discuss the possibility of extending the association analysis of discrete variable with disease to the causal analysis for discrete variable and disease.
- Research Article
22
- 10.1161/circgenetics.108.843946
- Apr 1, 2009
- Circulation: Cardiovascular Genetics
The sequencing of the human genome, the identification of common single-nucleotide polymorphisms (SNPs) and haplotype blocks, and advances in microarray technology have enabled the study of complex diseases at a level of detail not previously imaginable. These have aided in the design and analyses of association and linkage studies of many complex diseases including cardiovascular disease. Recent technological advances have enabled the undertaking of large-scale genome-wide association studies (GWAS) that can assay hundreds of thousands of polymorphic sites on hundreds to thousands of individuals to find genomic regions associated with disease. Although results from these experiments enable the identification of smaller regions of association compared with previous studies, as with all linkage and association studies, there is the need for the further investigation of regions of interest for the causal genes or variants. The purpose of this review is to present a detailed demonstration as to how publicly available resources can be used to easily guide more detailed research into genomic regions of interest identified in linkage and association study data. Large-scale projects, such as the Human Genome Sequencing project,1,2 have generated large volumes and varieties of annotated genomic data necessitating the development of Internet-based tools to organize and make practically available these public data. One important tool in human disease research is the web-based graphical genome browsers that use the human genome sequence as the framework on which to organize genomic annotations, providing various ways for researchers to view and extract important information. Currently, there are 3 human genome browsers that have been developed for public use: (1) the National Center for Biotechnology Information (NCBI) Map Viewer3; (2) the University of California Santa Cruz (UCSC) Genome Browser4; and (3) the European Bioinformatics Institute’s Ensembl system.5 Although these genome browsers share common features and …
- Research Article
73
- 10.1038/sj.jid.5700896
- Nov 1, 2007
- Journal of Investigative Dermatology
Polymorphisms in Interleukin-15 Gene on Chromosome 4q31.2 Are Associated with Psoriasis Vulgaris in Chinese Population
- Research Article
- 10.1289/isee.2020.virtual.p-0841
- Oct 26, 2020
- ISEE Conference Abstracts
Background/Aim: Per- and polyfluoroalkyl substances (PFAS) are persistent chemicals used in many products and manufacturing processes. Human exposure to some PFAS has been associated with serum cholesterol concentrations, but observed cardiovascular disease (CVD) associations have been inconsistent. Previous cross-sectional analyses of PFAS-CVD associations used U.S. National Health and Nutrition Examination Survey (NHANES) data. We prospectively examined associations between NHANES serum PFAS concentrations and subsequent myocardial infarction (MI), ischemic stroke (IS) or any stroke (AS), using linked National Death Index and Medicare claims data, for participants aged ≥65 years with PFAS measurements and Medicare fee-for-service enrollment (study population).Methods: NHANES (1999-2000, 2003-2012) serum PFAS concentrations [perfluorooctane sulfonate, perfluorooctanoic acid (PFOA), perfluorononanoic acid, perfluorohexane sulfonate] were analyzed using quartile (among cases) indicator variables and the natural log of quartile geometric means (continuous variable, trend test). CVD outcomes occurring after serum collection were identified, among participants reporting no prior history of the outcome, using linked (through 2013) Medicare claims ICD-9-CM codes and underlying cause-of-death ICD-10 codes. Survival analysis models with an age time scale; weighted to account for survey design and Medicare matching; and stratified on body mass index; controlled for survey cycle, age, gender, race/ethnicity, smoking, alcohol consumption, physical activity, education, and income-to-poverty ratio.Results: Among 1248 in the study population, 1078 reported no prior MI (72 developed MI) and 1102 reported no prior stroke (67 developed IS; 78 developed AS). Quartile-specific hazard ratios (HRs) and trends for all PFAS-outcome combinations were not statistically significant, but some elevated HRs were observed [e.g., IS HRs (95% confidence intervals) for PFOA quartiles 2-4 vs. 1: 2.04 (0.79-5.28), 1.98 (0.75-5.19), 1.63 (0.61-4.36)].Conclusions: This analysis did not provide clear evidence of an association between serum PFAS concentrations and MI, IS, or AS. Results should be interpreted considering study limitations (e.g., limited power, single exposure measurement).
- Research Article
79
- 10.1038/hdy.2010.91
- Jul 14, 2010
- Heredity
Population-based genomic association analyses are more powerful than within-family analyses. However, population stratification (unknown or ignored origin of individuals from multiple source populations) and cryptic relatedness (unknown or ignored covariance between individuals because of their relatedness) are confounding factors in population-based genomic association analyses, which inflate the false-positive rate. As a consequence, false association signals may arise in genomic data association analyses for reasons other than true association between the tested genomic factor (marker genotype, gene or protein expression) and the study phenotype. It is therefore important to correct or account for these confounders in population-based genomic data association analyses. The common correction techniques for population stratification and cryptic relatedness problems are presented here in the phenotype-marker association analysis context, and comments on their suitability for other types of genomic association analyses (for example, phenotype-expression association) are also provided. Even though many of these techniques have originally been developed in the context of human genetics, most of them are also applicable to model organisms and breeding populations.
- Research Article
33
- 10.1007/s10681-016-1740-0
- Jul 1, 2016
- Euphytica
Reducing oxalate content of spinach is a major breeding objective. The aim of this research was to conduct association analysis and identify SNP markers associated with oxalate concentration in spinach germplasm. A total of 310 spinach genotypes, including 300 USDA germplasm accessions and ten commercial cultivars, were used for the association analysis of oxalate concentration. Genotyping by sequencing was used to identify 841 SNPs among the genotypes examined for the association analysis. The distribution of oxalate concentration showed a near normal distribution with a wide range in concentrations from 647.2 to 1286.9 mg/100 g on a fresh weight basis and 53.4 to 108.8 mg/g on a dry weight basis. The range in oxalate concentration in spinach suggests that it is a complex quantitative trait which may be controlled by multiple genes, each with a minor effect among the tested spinach panel. Association analysis indicated that six SNP markers (AYZV02031464_116, AYZV02031464_117, AYZV02031464_95, AYZV02283363_2707, AYZV02287123_2830, and AYZV02296293_852) were associated with the oxalate concentration. The SNP markers may be useful for breeders to select germplasm for reduced oxalate concentrations in spinach breeding programs through marker-assisted selection.
- Book Chapter
3
- 10.1007/978-94-011-1130-0_20
- Jan 1, 1994
As late as in the 1960s, there were only about 30 genetic marker systems useful for association or linkage analyses and the number remained modest throughout the 1970s. Nevertheless, association analyses were carried out on many disorders or risk factors with the random markers that were available. In most cases, where an association was detected, there was no reason to believe that the marker system examined had any functional relationship to the disease under study. One simply used the markers at hand.KeywordsBlood GroupCholesteryl Ester Transfer ProteinCandidate Gene ApproachHigh Density Lipoprotein Cholesterol LevelCoronary Heart Disease Risk FactorThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Research Article
- 10.3724/sp.j.1118.2018.17202
- Jan 1, 2018
- Journal of Fishery Sciences of China
PDF HTML阅读 XML下载 导出引用 引用提醒 齐口裂腹鱼SNP标记与生长性状的关联分析 DOI: 作者: 作者单位: 西南大学 鱼类繁育与健康养殖研究中心, 重庆 402460 作者简介: 杨月静(1993-),女,硕士研究生,主要从事水产动物遗传育种与生物技术研究.E-mail:yangyuejing666@163.com 通讯作者: 中图分类号: S917 基金项目: 国家自然科学基金资助项目(31402302);中央高校基本科研业务费资助项目(XDJK2017B008);西南大学荣昌校区青年基金资助项目(20700938). Association analysis between SNP markers and growth-related traits in Schizothorax prenanti Author: Affiliation: Fisheries Breeding and Health Cultivation Research Center, Southwest University, Chongqing 402460, China Fund Project: 摘要 | 图/表 | 访问统计 | 参考文献 | 相似文献 | 引证文献 | 资源附件 | 文章评论 摘要:齐口裂腹鱼()是中国特有的冷水性经济鱼类,具有较高的营养和经济价值。为了研究齐口裂腹鱼生长性状关联的分子标记,本研究以114尾同批次繁殖、相同养殖条件的齐口裂腹鱼为研究材料,运用26个齐口裂腹鱼SNPs位点进行生长性状(体重、体高、全长、体长)关联分析。对齐口裂腹鱼生长性状的主成分分析表明,体重占解释方差的93.42%,且特征根大于1,累积方差大于85%,是齐口裂腹鱼生长性状的第一主成分。SNPs位点与生长性状相关分析结果显示:对全长、体长有显著影响(ug22712-0-2452对体重、体高、全长和体长都有极显著影响(<0.01)。对与生长性状显著相关的4个SNPs位点进行多态性检测,平均观测杂合度和期望杂合度分别为0.2369和0.2110,平均多态信息含量为0.18,其中ug22712-0-2452与生长性状具有显著相关性,可作为候选位点用于齐口裂腹鱼的分子标记辅助育种。本研究旨为齐口裂腹鱼的遗传改良和选择育种提供基础资料。 Abstract:, an economically important, cold-water fish species, has rich nutritional value and high economic value. With the increase in artificial breeding and breeding intensification, the germplasm resources of have been degraded, which is revealed in individual miniaturization, slow growth, and decreased disease resistance. Therefore, it is necessary to screen the molecular markers of growth traits and use molecular-marker-assisted breeding. In order to acquire some reliable molecular genetic markers for growth-related traits, the correlation analysis of 26 SNP markers and growth-related traits in were analyzed using 114 samples with the same growth conditions. A principal component analysis showed that body weight accounted for 93.42% of the variance, the eigenvalue was greater than 1 and the accumulative variance ratio was more than 85%, and it was the first principal component of the growth traits of . Correlation analysis between genotypes of SNPs and growth traits indicated that <0.05), and showed a significant influence on total length and body length (ug25050-0-1678 and body weight, body height, total length, and body length ( and the body weight, body height, and body length were significantly associated (<0.01). We also estimated that the genetic diversity parameters for 4 loci were significantly correlated with the growth traits. The mean observed heterozygosity, expected heterozygosity, and polymorphism information content (PIC) were 0.2369, 0.2110 and 0.17 respectively. The polymorphism of ug25050-0-1678 and were significantly associated with growth traits, and could be used as important candidate molecular markers for breeding selection of . Our results could provide an effective basis for the study of genetic improvement and selective breeding in 参考文献 相似文献 引证文献
- Research Article
9
- 10.1186/s12863-017-0489-3
- Mar 9, 2017
- BMC Genetics
BackgroundPotato frying color is an agronomic trait influenced by the sugar content of tubers. The candidate gene approach was employed to elucidate the molecular basis of this trait in Solanum tuberosum Group Phureja, which is mainly diploid and represents an important genetic resource for potato breeding. The objective of this research was to identify novel genetic variants related with frying quality in loci with key functions in carbohydrate metabolism, with the purpose of discovering genetic variability useful in breeding programs. Therefore, an association analysis was implemented with 109 SNP markers identified in ten candidate genes.ResultsThe analyses revealed four associations in the locus InvGE coding for an apoplastic invertase and one association in the locus SssI coding for a soluble starch synthase. The SNPs SssI-C45711901T and InvGE-C2475454T were associated with sucrose content and frying color, respectively, and were not found previously in tetraploid genotypes. The rare haplotype InvGE-A2475187C2475295A2475344 was associated with higher fructose contents. Our study allowed a more detailed analysis of the sequence variation of exon 3 from InvGE, which was not possible in previous studies because of the high frequency of insertion-deletion polymorphisms in tetraploid potatoes.ConclusionThe association mapping strategy using a candidate gene approach in Group Phureja allowed the identification of novel SNP markers in InvGE and SssI associated with frying color and the tuber sugar content measured by High Performance Liquid Chromatography (HPLC). These novel associations might be useful in potato breeding programs for improving quality traits and to increase crop genetic variability. The results suggest that some genes involved in the natural variation of tuber sugar content and frying color are conserved in both Phureja and tetraploid germplasm. Nevertheless, the associated variants in both types of germplasm were present in different regions of these genes. This study contributes to the understanding of the genetic architecture of tuber sugar contents and frying color at harvest in Group Phureja.
- Research Article
33
- 10.1086/508264
- Oct 1, 2006
- The American Journal of Human Genetics
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants
- Research Article
- 10.21608/ajs.2020.40622.1248
- Sep 15, 2020
- Arab Universities Journal of Agricultural Sciences
Wheat is an essential staple food in the developing world, where demand is projected to grow exponentially in the future; simultaneously, climate changes are projected to reduce supply in the near future. One of the main consequences of climate change is salinity, which negatively impacts the world's cultivated area and therefore affects the global wheat production. Our objectives are to study the population structure of several Egyptian and international wheat accessions and to identify the genetic factors controlling the salinity stress response of b wheat. In addition, we have attempt to identify genes that control some important agronomic parameters of wheat under salinity stress were identified. The wheat germplasm panel consisted of 70 accessions obtained from Egypt, Syria and Iran. The assessment of salinity tolerance was conducted over the years of 2018 and 2019 in the field and in the greenhouse. The genome association analysis (GWAS) and population structure analysis was conducted using six SCoT, five SSR and 93 SNP markers. Analysis of the population structure using allele frequency and phylogenetic analysis indicated that the studied wheat accessions were belong to four population groups. Where, for the most portion, Egyptian, Syrian and Iranian accessions are clustered depending on their country of origin. The GWAS analysis revealed 13 SNP markers that were significantly associated with morpho-agronomic wheat traits during salinity stress. These markers were closely related to genes that are known to have a direct link to wheat response to salinity stress such as CYP709B2, MDIS2, STAY-GREEN, PIP5K9, and MSSP2 genes. This study revealed the genetic structure of adapted and imported wheat accessions, which could be used to select potential wheat accessions for local breeding programs. In addition, the SNP genotyping assay is a very potential technology that could be efficiently applied to detect genes that control bread wheat response to salinity stress.
- Research Article
62
- 10.2527/jas.2012-5121
- Jul 5, 2012
- Journal of Animal Science
Gastrointestinal nematodes are one of the main health issues in sheep breeding. To identify loci affecting the resistance to Haemonchus contortus, a genome scan was carried out using 1,275 Romane × Martinik Black Belly backcross lambs. The entire population was challenged with Haemonchus contortus in 2 consecutive experimental infections, and fecal egg counts (FEC) and packed cell volumes were measured. A subgroup of 332 lambs with extreme FEC was necropsied to determine the total worm burden, length of female worms, sex ratio in the worm population, abomasal pH, and serum and mucosal G immunoglobulins (IgG) responses. Pepsinogen concentration was measured in another subset of 229 lambs. For QTL detection, 160 microsatellite markers were used as well as the Illumina OvineSNP50 BeadChip that provided 42,469 SNP markers after quality control. Linkage, association, and joint linkage and association analyses were performed with the QTLMAP software. Linkage disequilibrium (LD) was estimated within each pure breed, and association analyses were carried out either considering or not the breed origin of the haplotypes. Four QTL regions on sheep chromosomes (OAR)5, 12, 13, and 21 were identified as key players among many other QTL with small to moderate effects. A QTL on OAR21 affecting pepsinogen concentration exactly matched the pepsinogen (PGA5) locus. A 10-Mbp region affecting FEC after the 1st and 2nd infections was found on OAR12. The SNP markers outperformed microsatellites in the linkage analysis. Taking advantage of the LD helped to refine the locations of the QTL mapped on OAR5 and 13.
- Research Article
46
- 10.1007/s00122-017-2987-0
- Sep 25, 2017
- Theoretical and Applied Genetics
This is the first report on association analysis of salt tolerance and identification of SNP markers associated with salt tolerance in cowpea. Cowpea (Vigna unguiculata (L.) Walp) is one of the most important cultivated legumes in Africa. The worldwide annual production in cowpea dry seed is 5.4 million metric tons. However, cowpea is unfavorably affected by salinity stress at germination and seedling stages, which is exacerbated by the effects of climate change. The lack of knowledge on the genetic underlying salt tolerance in cowpea limits the establishment of a breeding strategy for developing salt-tolerant cowpea cultivars. The objectives of this study were to conduct association mapping for salt tolerance at germination and seedling stages and to identify SNP markers associated with salt tolerance in cowpea. We analyzed the salt tolerance index of 116 and 155 cowpea accessions at germination and seedling stages, respectively. A total of 1049 SNPs postulated from genotyping-by-sequencing were used for association analysis. Population structure was inferred using Structure2.3.4; K optimal was determined using Structure Harvester. TASSEL 5, GAPIT, and FarmCPU involving three models such as single marker regression, general linear model, and mixed linear model were used for the association study. Substantial variation in salt tolerance index for germination rate, plant height reduction, fresh and dry shoot biomass reduction, foliar leaf injury, and inhibition of the first trifoliate leaf was observed. The cowpea accessions were structured into two subpopulations. Three SNPs, Scaffold87490_622, Scaffold87490_630, and C35017374_128 were highly associated with salt tolerance at germination stage. Seven SNPs, Scaffold93827_270, Scaffold68489_600, Scaffold87490_633, Scaffold87490_640, Scaffold82042_3387, C35069468_1916, and Scaffold93942_1089 were found to be associated with salt tolerance at seedling stage. The SNP markers were consistent across the three models and could be used as a tool to select salt-tolerant lines for breeding improved cowpea tolerance to salinity.
- Research Article
47
- 10.1007/s00122-013-2081-1
- Mar 6, 2013
- Theoretical and Applied Genetics
Anthracnose in sorghum caused by Colletotrichum sublineolum is one of the most destructive diseases affecting sorghum production under warm and humid conditions. Markers and genes linked to resistance to the disease are important for plant breeding. Using 14,739 SNP markers, we have mapped eight loci linked to resistance in sorghum through association analysis of a sorghum mini-core collection consisting of 242 diverse accessions evaluated for anthracnose resistance for 2 years in the field. The mini-core was representative of the International Crops Research Institute for the Semi-Arid Tropics' world-wide sorghum landrace collection. Eight marker loci were associated with anthracnose resistance in both years. Except locus 8, disease resistance-related genes were found in all loci based on their physical distance from linked SNP markers. These include two NB-ARC class of R genes on chromosome 10 that were partially homologous to the rice blast resistance gene Pib, two hypersensitive response-related genes: autophagy-related protein 3 on chromosome 1 and 4 harpin-induced 1 (Hin1) homologs on chromosome 8, a RAV transcription factor that is also part of R gene pathway, an oxysterol-binding protein that functions in the non-specific host resistance, and homologs of menthone:neomenthol reductase (MNR) that catalyzes a menthone reduction to produce the antimicrobial neomenthol. These genes and markers may be developed into molecular tools for genetic improvement of anthracnose resistance in sorghum.
- Research Article
10
- 10.1007/s11032-015-0298-1
- Mar 24, 2015
- Molecular Breeding
Low-temperature germinability (LTG) is an important trait for breeding of varieties for use in direct-seeding rice production systems. Although rice (Oryza sativa L.) is generally sensitive to low temperatures, genetic variation for LTG exists and several quantitative trait loci (QTLs) have been reported. Most notably, the gene underlying the major effect QTL qLTG3-1 has been cloned and implicated in tissue weakening. The objective of this study was to develop molecular markers for use in selecting rice germplasm with enhanced LTG. A panel of japonica rice accessions (n = 180) from temperate regions in Asia was evaluated for LTG and genotyped with markers from qLTG3-1 and regions previously reported to harbor other LTG QTLs. In addition to the germplasm evaluation using these markers, an association analysis was conducted using SNP data generated by reduced representation sequencing of the panel. Eight SNP markers were found to be significantly associated with LTG using general and mixed linear models. Two of these markers were in close proximity (~35 kb) to each other on chromosome 4 in a region previously linked to LTG in rice. The identification of several markers strongly associated with LTG in locations not previously reported provides a foundation for further genetic dissection of this complex trait.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.