Abstract Small non-coding RNAs (sncRNAs) have established roles as post-transcriptional regulators of cancer pathogenesis. We recently reported a novel and previously unannotated class of cancer-specific sncRNAs in breast cancer and demonstrated that breast cancer cells exploit a specific sncRNA to promote cancer metastasis. However, the extent to which these sncRNAs, which we have collectively termed orphan non-coding RNAs (oncRNAs), are present in other cancer types is unknown. To address this question and define a high-confidence set of oncRNAs, we used smRNA-seq data from 6 cancer sites (breast, colorectal, kidney, liver, lung, and stomach) and their corresponding normal tissues from The Cancer Genome Atlas (TCGA; 4,445 cancer, 431 normal) and identified a total of 144,695 oncRNAs that are significantly present in cancer and largely absent in normal tissue (Fisher’s Exact Test and Benjamini-Hochberg correction, FDR < 0.1). To evaluate if this set of TCGA-derived oncRNAs could be validated in independent datasets, we examined smRNA-seq data from two large independent cohorts comprising these same cancer and normal tissue types (Indivumed, Hamburg, Germany). Cohort A consists of 4,024 samples (2,245 cancer, 1,779 normal) and cohort B consists of 2,874 samples (2,063 cancer; 811 normal). oncRNAs in these cohorts were annotated following the same procedure used for TCGA data. TCGA-derived oncRNAs were considered validated in the independent cohorts if they were present in a significantly higher number of cancer samples compared to adjacent normal tissue samples. In cohort A, 140,191 (96.9%) of TCGA-derived oncRNAs were detected in at least one sample, of which 74,634 (51.6%) were validated as oncRNAs. In cohort B, 140,147 (96.9%) oncRNAs were observed and 68,366 (47.2%) were validated. The degree of overlap between the validated oncRNAs in each cohort was significant, with 54,294 (37.5%) overlapping oncRNAs (hypergeometric test, P=0). We also found that oncRNAs are informative of cancer tissue of origin, demonstrating the existence of consistent cancer-specific oncRNA expression profiles in independent studies. Using the TCGA-derived oncRNAs as features, we trained an eXtreme Gradient Boosting (XGB) model on TCGA data to classify cancer samples by the 6 tissues of origin. The TCGA-trained model showed high performance when evaluated on both cohorts A and B, achieving accuracies of 91.5% (95% CI: 90.3%-92.7%) and 96% (94.7%-97%), respectively. For comparison, this model achieved an accuracy of 96% (94.5%-97.2%) on held-out TCGA data (80/20 train/test split). Our results show a robust validation of TCGA-derived oncRNAs in external, independently sourced and processed cancer tissue cohorts across a heterogeneous set of cancer sites. Our machine learning model also demonstrates that oncRNA profiles can be used to predict cancer tissue of origin with high generalizability and accuracy. Citation Format: Jeffrey Wang, Helen Li, Lisa Fish, Kimberly H. Chau, Patrick Arensdorf, Hani Goodarzi, Babak Alipanahi. Discovery and validation of orphan noncoding RNA profiles across multiple cancers in TCGA and two independent cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 3353.