From the perspective of data science, we propose a cancer diagnosis method combining miRNA-lncRNA interaction pairs and class weight competition. First, miRNA-lncRNA interaction data is introduced into joint expression profiles, and the complex mechanism of cancer development is demonstrated in depth through the reappearance of key association information. This is an information ensemble of three carcinogenic mechanisms at dataset construction level: classical genetics, epigenetics, and the complex interaction effect between miRNAs and lncRNAs. Then, we put forward a hybrid feature selection algorithm. By preserving the interaction relationship between miRNAs and lncRNAs, it quickly and steadily removes irrelevant and redundant features and solves the high-dimensional disaster problem of cancer expression profiles. This is an information ensemble of multiple feature selection algorithms and the significant association relationship found between multi-dimensional features at feature selection level. A diversity sampling and multi-algorithm learners are used to construct a multiple heterogeneous classification models, which overcomes the small size of normal samples and the local optimum of single algorithm and single mode. This is an information ensemble of multiple classification model structures and multiple classification model state parameters at classification modeling level. At decision level, the proposed class weight which does not depend on the sample size is constructed to address the issue of unbalanced sample class of cancers. The ensemble of multi-category multi-state information at four levels (dataset construction, feature selection, classification modeling, and decision) constitutes the framework of the proposed method. We classify BRCA, LUAD and LUSC in TCGA. Compared with the state-of-the-art classification methods, the proposed method has improved classification accuracy by 9.25%~21.25%, sensitivity by 6.45%~66.45%, and specificity by 10.11%. In addition, we find that lincRNA instead of miRNA always appears in each group of feature genes, which provides a new clue for the locus target selection in cancer treatment.
Read full abstract