Abstract

ABSTRACTColorectal cancer is a common and deadly disease in the United States accounting for over 50,000 deaths in 2020. This progressive disease is highly preventable with early detection and treatment, but many people do not comply with the recommended screening guidelines. The gut microbiome has emerged as a promising target for noninvasive detection of colorectal cancer. Most microbiome-based classification efforts utilize taxonomic abundance data from operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) with the goal of increasing taxonomic resolution. However, it is unknown which taxonomic resolution is optimal for microbiome-based classification of colorectal cancer. To address this question, we used a reproducible machine learning framework to quantify classification performance of models based on data annotated to phylum, class, order, family, genus, OTU, and ASV levels. We found that model performance increased with increasing taxonomic resolution, up to the family level where performance was equal (P > 0.05) among family (mean area under the receiver operating characteristic curve [AUROC], 0.689), genus (mean AUROC, 0.690), and OTU (mean AUROC, 0.693) levels before decreasing at the ASV level (P < 0.05; mean AUROC, 0.676). These results demonstrate a trade-off between taxonomic resolution and prediction performance, where coarse taxonomic resolution (e.g., phylum) is not distinct enough, but fine resolution (e.g., ASV) is too individualized to accurately classify samples. Similar to the story of Goldilocks and the three bears (L. B. Cauley, Goldilocks and the Three Bears, 1981), mid-range resolution (i.e., family, genus, and OTU) is “just right” for optimal prediction of colorectal cancer from microbiome data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.