SCNrank: spectral clustering for network-based ranking to reveal potential drug targets and its application in pancreatic ductal adenocarcinoma

Enze Liu,Zhuang Zhuang Zhang,Xiaolin Cheng,Xiaoqi Liu,Lijun Cheng

doi:10.1186/s12920-020-0681-6

Abstract

BackgroundPancreatic ductal adenocarcinoma (PDAC) is the most common pancreatic malignancy. Due to its wide heterogeneity, PDAC acts aggressively and responds poorly to most chemotherapies, causing an urgent need for the development of new therapeutic strategies. Cell lines have been used as the foundation for drug development and disease modeling. CRISPR-Cas9 plays a key role in every step-in drug discovery: from target identification and validation to preclinical cancer cell testing. Using cell-line models and CRISPR-Cas9 technology together make drug target prediction feasible. However, there is still a large gap between predicted results and actionable targets in real tumors. Biological network models provide great modus to mimic genetic interactions in real biological systems, which can benefit gene perturbation studies and potential target identification for treating PDAC. Nevertheless, building a network model that takes cell-line data and CRISPR-Cas9 data as input to accurately predict potential targets that will respond well on real tissue remains unsolved.MethodsWe developed a novel algorithm ‘Spectral Clustering for Network-based target Ranking’ (SCNrank) that systematically integrates three types of data: expression profiles from tumor tissue, normal tissue and cell-line PDAC; protein-protein interaction network (PPI); and CRISPR-Cas9 data to prioritize potential drug targets for PDAC. The whole algorithm can be classified into three steps: 1. using STRING PPI network skeleton, SCNrank constructs tissue-specific networks with PDAC tumor and normal pancreas tissues from expression profiles; 2. With the same network skeleton, SCNrank constructs cell-line-specific networks using the cell-line PDAC expression profiles and CRISPR-Cas 9 data from pancreatic cancer cell-lines; 3. SCNrank applies a novel spectral clustering approach to reduce data dimension and generate gene clusters that carry common features from both networks. Finally, SCNrank applies a scoring scheme called ‘Target Influence score’ (TI), which estimates a given target’s influence towards the cluster it belongs to, for scoring and ranking each drug target.ResultsWe applied SCNrank to analyze 263 expression profiles, CRPSPR-Cas9 data from 22 different pancreatic cancer cell-lines and the STRING protein-protein interaction (PPI) network. With SCNrank, we successfully constructed an integrated tissue PDAC network and an integrated cell-line PDAC network, both of which contain 4414 selected genes that are overexpressed in tumor tissue samples. After clustering, 4414 genes are distributed into 198 clusters, which include 367 targets of FDA approved drugs. These drug targets are all scored and ranked by their TI scores, which we defined to measure their influence towards the network. We validated top-ranked targets in three aspects: Firstly, mapping them onto the existing clinical drug targets of PDAC to measure the concordance. Secondly, we performed enrichment analysis to these drug targets and the clusters there are within, to reveal functional associations between clusters and PDAC; Thirdly, we performed survival analysis for the top-ranked targets to connect targets with clinical outcomes. Survival analysis reveals that overexpression of three top-ranked genes, PGK1, HMMR and POLE2, significantly increases the risk of death in PDAC patients.ConclusionSCNrank is an unbiased algorithm that systematically integrates multiple types of omics data to do potential drug target selection and ranking. SCNrank shows great capability in predicting drug targets for PDAC. Pancreatic cancer-associated gene candidates predicted by our SCNrank approach have the potential to guide genetics-based anti-pancreatic drug discovery.

Highlights

Pancreatic cancer is the third leading cause of cancer death in the United States
Survival analysis reveals that overexpression of three topranked genes, phosphoglycerate kinase 1 (PGK1), Hyaluronan Mediated Motility Receptor (HMMR) and POLE2, significantly increases the risk of death in Pancreatic ductal adenocarcinoma (PDAC) patients
Overlapping 15,664 common genes among 263 gene expression profiles for tumor tissue, normal tissue and cell-line are included for SCNrank analysis, among which 7376 genes are significantly dysregulated by nonpaired t-test with a p-value less than 0.05. 4584 genes out of 7376 genes are significantly over-expressed in the tumor tissues group compared to the normal tissue group

Summary

Introduction

Pancreatic cancer is the third leading cause of cancer death in the United States. The American Cancer Society estimates that 53,070 Americans will be diagnosed with pancreatic cancer in 2017, and that 41,780 will die from the disease [1]. Patients diagnosed with PDAC are usually diagnosed at advanced stages, when tumor cells have spread into the lymphatic system and vicinal organs, which limit the choices of effective treatments [2]. Another challenge in treating PDAC is its treatmentrecalcitrant characteristics [3, 4], which often lead to insensitivity towards many chemotherapeutic drugs and target-based drugs [5]. Building a network model that takes cell-line data and CRISPR-Cas data as input to accurately predict potential targets that will respond well on real tissue remains unsolved

Objectives

Methods

Results

Discussion

Conclusion