Abstract
Background: Decades of successful biomedical research have led to the rapid accumulation of large-volume text data (unstructured data). Defining relationships and structuring connections of these fragmented data from the many published can uncover valuable information hidden in this treasure trove. However, the quantity of text data is overwhelming and manual extraction is a forbitten task. This situation demands the development of automated tools to effectively and accurately processing the giant volume of available textual data (e.g., ~27M on PubMed, ~160M on Google Scholar). Objectives: To address this challenge, we developed a text-mining and machine learning algorithm to dissect textual data on CVD and identify protein patterns in datasets to uncover valuable information. Methods: We applied a novel phrase mining workflow, Context-aware Semantic Online Analytical Processing (CaseOLAP), to recognize patterns from six CVD datasets based on their MeSH-terms: cerebrovascular accidents (CVA), cardiomyopathies (CM), ischemic heart diseases (IHD), arrhythmias (ARR), valvular heart disease (VHD) and Congenital Heart Disease (CHD). We analyzed the patterns of 8,325 cardiac proteins in 1.1 million publications (1995-2016). Results: Over 8,325 proteins only a subset exhibited high CaseOLAP scores indicating high relevance in CVD, mainly displayed in IHD, CM and CVA. We identified six high scoring protein clusters unique to one CVD group. A principle component analysis indicated that IHD, CVA and CM showed distinct protein scoring patterns while CHD, VHD and ARR were clustered. We identified 10 protein clusters shared between two or more CVD groups with biological functions in inflammation, contractility, blood coagulation, hemodynamic regulation, cytoskeletal organization and neurotransmission. Inflammatory proteins appeared to be relevant in all CVDs, while proteins in neurotransmission and memory processing were relevant in CVA, ARR, VHD, and CHD. Conclusions: Using CaseOLAP on textual data across six CVDs we gained novel insights into patterns and relationships of their proteins. This text-mining algorithm offers promising biomedical applications to facilitate patient studies in clinical trials, case reports and electronic health records.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.