HC-UAP: Outliers detection method based-on hierarchical clustering for universally aligned time-series RNA-Seq profiles

Abedalrhman Alkhateeb

doi:10.5267/j.dsl.2022.10.004

Abstract

Tracking abundant gene transcripts quantification over continuous cancer progression stages may reveal the mechanism of disease advancement. In this work, we profile the transcript quantification over the stages using a time-series approach, in which the stages/sub-stages of the disease are the time points, and the quantification measurements are the values. The values over time points are used to interpolate the growth of the progression using the cubic spline function. Then, the transcripts profiles are universally aligned and clustered using the time-series profile hierarchical clustering method based on the area between each pair of the aligned profiles; the method is named (HC-UAP). We compare the proposed method with a hierarchical clustering method based on Euclidean distance (HC-ED). Both methods were applied on two next-generation sequencing (NGS) prostate cancer datasets, the first from the Chinese and the second from the North American population. HC-ED clusters the dataset to find patterns while HC-UAP was able to single out outliers that trend differently in both datasets. While finding patterns in gene expression that trend over stages is the standard approach for analyzing time-series models, identifying outlier transcripts that grow differently than other transcripts can provide more details about the contribution of the mRNA transcriptional activity to the disease. They also can be a potential biomarker for the disease progression.

Full Text