Abstract
PurposeRecent high-throughput sequencing technology has identified numerous somatic mutations across the whole exome in a variety of cancers. In this study, we generate a predictive model employing the whole exome somatic mutational profile of ovarian high-grade serous carcinomas (Ov-HGSCs) obtained from The Cancer Genome Atlas data portal.MethodsA total of 311 patients were included for modeling overall survival (OS) and 259 patients were included for modeling progression free survival (PFS) in an analysis of 509 genes. The model was validated with complete leave-one-out cross-validation involving re-selecting genes for each iteration of the cross-validation procedure. Cross-validated Kaplan-Meier curves were generated. Cross-validated time dependent receiver operating characteristic (ROC) curves were computed and the area under the curve (AUC) values were calculated from the ROC curves to estimate the predictive accuracy of the survival risk models.ResultsThere was a significant difference in OS between the high-risk group (median, 28.1 months) and the low-risk group (median, 61.5 months) (permutated p-value <0.001). For PFS, there was also a significant difference in PFS between the high-risk group (10.9 months) and the low-risk group (22.3 months) (permutated p-value <0.001). Cross-validated AUC values were 0.807 for the OS and 0.747 for the PFS based on a defined landmark time t = 36 months. In comparisons between a predictive model containing only gene variables and a combined model containing both gene variables and clinical covariates, the predictive model containing gene variables without clinical covariates were effective and high AUC values for both OS and PFS were observed.ConclusionsWe designed a predictive model using a somatic mutation profile obtained from high-throughput genomic sequencing data in Ov-HGSC samples that may represent a new strategy for applying high-throughput sequencing data to clinical practice.
Highlights
Recent high-throughput sequencing technology has generated an enormous amount of data that continues to accumulate for somatic mutations in a variety of cancers
In case where the number of candidate variables exceeds the number of cases, which is common in high throughput genomic data analysis, complete cross-validation is one of established methods and it has widely used for modeling and estimating prediction error in the model [2,4]
All data presented in this report are based on classification during the leave-one-out cross-validation (LOOCV) procedure and are fully cross validated
Summary
Recent high-throughput sequencing technology has generated an enormous amount of data that continues to accumulate for somatic mutations in a variety of cancers. It is important to consider how the data from somatic mutational profiles containing survival information can be applied in clinical use. From this view point, the development of predictive modeling using somatic mutation profiles that employ complete genomic data with survival information may be worthwhile. In case where the number of candidate variables exceeds the number of cases, which is common in high throughput genomic data analysis, complete cross-validation is one of established methods and it has widely used for modeling and estimating prediction error in the model [2,4]. The predictive models are developed from scratch, repeating variable selection and calibration, for each loop of the cross-validation [2]. There are several cross-validation methods, which include leave-one-out cross-validation (LOOCV), v-fold, and bootstrap resampling
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.