Abstract

Long noncoding RNA plays important role in changing the expression profiles of various target genes that leads to cancer development. So, identifying key lncRNAs related to the origin of different types of cancers might help in developing cancer therapy. To discover the critical lncRNAs that can identify the origin of different cancers, we proposed to use the state-of-the-art deep learning algorithm Concreate Autoencoder (CAE). The motivation behind using the CAE was that it takes advantage of both AE (which can achieve the highest classification accuracy) and concrete relaxation-based feature selection (which is capable of selecting actual features instead of latent features). To compare the performance of CAE, three frequently used embedded feature selection techniques including Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Support Vector Machine with Recursive Feature Elimination (SVM-RFE) were used. To obtain a stable set of lncRNAs capable of identifying the origin of 33 different cancers, a lncRNA that was isolated by at least two of the four techniques (CAE, LASSO, RF, and SVM-RFE) was added to the final list of key lncRNAs.The genome-wide lncRNA expression profiles of 33 different types of cancers, a total of 9566 samples, available in The Cancer Genome Atlas (TCGA) were analyzed to discover the key lncRNAs. Our results showed that CAE performs better in feature selection, specially, in selecting small number of features, compared to LASSO, RF, and SVM-RFE. With the increasing number of selected features ranging from 10 to 500 lncRNAs, the accuracy of different feature selection approaches increases as - CAE: 70% to 96%; LASSO: 55% to 94%; RF: 38% to 95%; SVM-RFE: 50% to 94%. This study discovered a set of 69 lncRNAs that can identify the origin of 33 different cancers with an accuracy of 93%. Note that the accuracy could be higher using AE, which uses latent features for classification thus failing to correlate the origin of cancers with the actual features (lncRNAs).The proposed computational framework can be used as a diagnostic tool by the physicians to discover the origin of cancers using the expression profiles of lncRNAs. The discovered lncRNAs can be studied further by biologists or drug designer to identify possible targets for cancer therapy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.