A supervised machine learning-based methodology for analyzing dysregulation in splicing machinery: An application in cancer diagnosis.

Oscar Reyes,Sebastián Ventura,Raúl M Luque,Eduardo Pérez,Justo Castaño

doi:10.1016/j.artmed.2020.101950

Oscar Reyes, Sebastián Ventura + Show 3 more

https://doi.org/10.1016/j.artmed.2020.101950

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deregulated splicing machinery components have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining such splicing components, however, is not a straightforward task mainly due to the heterogeneity of tumors, the variability across samples, and the fat-short characteristic of genomic datasets. In this work, a supervised machine learning-based methodology is proposed, allowing the determination of subsets of relevant splicing components that best discriminate samples. The methodology comprises three main phases: first, a ranking of features is determined by means of applying feature weighting algorithms that compute the importance of each splicing component; second, the best subset of features that allows the induction of an accurate classifier is determined by means of conducting an effective heuristic search; then the confidence over the induced classifier is assessed by means of explaining the individual predictions and its global behavior. At the end, an extensive experimental study was conducted on a large collection of transcript-based datasets, illustrating the utility and benefit of the proposed methodology for analyzing dysregulation in splicing machinery.

Full Text