Abstract

BackgroundA substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry.ResultsWe analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations.ConclusionsCF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders.

Highlights

  • A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation

  • Feature construction We analyzed 1,892 ASD subjects (1623 males and 270 females) and 2,342 platform-matched controls (1093 males and 1250 females) with at least one rare copy number variation (CNV); all subjects are of Caucasian ethnicity

  • We successfully used rare CNVs and neurally-relevant gene annotations to classify ASD subjects: the best classifier achieved an area under the curve (AUC) of 0.533, corresponding to 7.9% ASD subjects correctly classified by rare CNVs and less than 3% incorrectly classified controls; this result is reasonably close to prior expectations that about 10% of ASD subjects have rare CNV contributing to ASD risk

Read more

Summary

Introduction

A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. ASDs are highly heritable [3] and genomic studies have revealed that a substantial proportion of ASD risk resides in de novo germline and rare inherited genetic variation, ranging from chromosome. De novo CNVs are observed in up to 5-10% of screened ASD subjects; not all of these events have a clear contribution to ASD risk, which is thought to depend on the size of the genomic change and the gene pathways perturbed. In this sample collection, 3.0% of ASD subjects harbored a de novo or inherited genic CNVs classified as pathogenic according to clinical annotation guidelines [18] and consensus catalogue of ASD loci (124 genes and 55 loci) [16]; more than half of these pathogenic CNVs were de novo. Unlike other published studies focusing uniquely on pathogenic CNV classification [19], we aimed at classifying subjects based on the contribution of all rare CNVs

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.