Abstract
Accurately predicting and testing the types of Pulmonary arterial hypertension (PAH) of each patient using cost-effective microarray-based expression data and machine learning algorithms could greatly help either identifying the most targeting medicine or adopting other therapeutic measures that could correct/restore defective genetic signaling at the early stage. Furthermore, the prediction model construction processes can also help identifying highly informative genes controlling PAH, leading to enhanced understanding of the disease etiology and molecular pathways. In this study, we used several different gene filtering methods based on microarray expression data obtained from a high-quality patient PAH dataset. Following that, we proposed a novel feature selection and refinement algorithm in conjunction with well-known machine learning methods to identify a small set of highly informative genes. Results indicated that clusters of small-expression genes could be extremely informative at predicting and differentiating different forms of PAH. Additionally, our proposed novel feature refinement algorithm could lead to significant enhancement in model performance. To summarize, integrated with state-of-the-art machine learning and novel feature refining algorithms, the most accurate models could provide near-perfect classification accuracies using very few (close to ten) low-expression genes.
Highlights
Pulmonary arterial hypertension (PAH) is a fatal and progressive disease characterized by increasing pulmonary vascular resistance leading to heart failure and death [1,2,3,4]
The majority of the PAH cases in human beings were found to be unassociated with BMPR2 mutation [1], and some other factors have been identified to be partially contributing to idiopathic form of PAH (IPAH) [6,7]
To directly test the hypothesis that low-expression genes might be greatly informative in differentiating different forms of PAH patient groups, we proposed a series of feature filtering methods including a traditional inter-quartile range (IQR)-based filtering method and others that are rarely used in conventional microarray-based clinical studies
Summary
Pulmonary arterial hypertension (PAH) is a fatal and progressive disease characterized by increasing pulmonary vascular resistance leading to heart failure and death [1,2,3,4]. The underlying interactive effects of different genetic mutations and fundamental mechanisms of vascular dysfunction remains unclear [4]. Mutation in bone morphogenetic protein (BMP) receptor type II (BMPR2) was found to be significantly correlated with the development of both heritable (HPAH) and the idiopathic form of PAH (IPAH) [5]. Genes expression signatures from IPAH patients were either tightly clustered with HPAH group or in an isolated cluster [4]. These findings suggest that different forms of PAH might share the majority of the molecular origins/signaling pathways but there might exist some distinct factors modulating the primary genetic expression in each case [1]. We attributed the difficulty in fully unraveling the genetic factors causing PAH largely to the lack of high-quality patient data in conjunction with advanced data processing algorithms, limited comprehension of the genetic etiology, and overlook of some of the important low-expression genes that might interactively affect PAH as a whole
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.