Abstract

The complexity of data has burgeoned to such an extent that scientists of every realm are encountering the incessant challenge of data management. Modern-day analytical approaches with the help of free source tools and programming languages have facilitated access to the context of the various domains as well as specific works reported. Here, with this article, an attempt has been made to provide a systematic analysis of all the available reports at PubMed on Proteome using text mining. The work is comprised of scientometrics as well as information extraction to provide the publication trends as well as frequent keywords, bioconcepts and most importantly gene–gene co-occurrence network. Out of 33,028 PMIDs collected initially, the segregation of 24,350 articles under 28 Medical Subject Headings (MeSH) was analyzed and plotted. Keyword link network and density visualizations were provided for the top 1000 frequent Mesh keywords. PubTator was used, and 322,026 bioconcepts were able to extracted under 10 classes (such as Gene, Disease, CellLine, etc.). Co-occurrence networks were constructed for PMID-bioconcept as well as bioconcept–bioconcept associations. Further, for creation of subnetwork with respect to gene–gene co-occurrence, a total of 11,100 unique genes participated with mTOR and AKT showing the highest (64) number of connections. The gene p53 was the most popular one in the network in accordance with both the degree and weighted degree centrality, which were 425 and 1414, respectively. The present piece of study is an amalgam of bibliometrics and scientific data mining methods looking deeper into the whole scale analysis of available literature on proteome.

Highlights

  • The booming of omics technologies has notably influenced almost all the realms of life science

  • Implementation of the powerful algorithm as well as technologies such as artificial intelligence (AI), machine learning (ML), deep learning, etc., aided with programming languages have been a common practice in recent years

  • To access the reports related to the term “Proteome”, it was used as a Mesh term and searched on PubMed

Read more

Summary

Introduction

The booming of omics technologies has notably influenced almost all the realms of life science. The massive amount of data has created the Big Data issue, and to manage such data, efficient data analytics approaches are required These kinds of technologies have resulted in exponential growth in the scientific literature too, along with other high-throughput data. Implementation of the powerful algorithm as well as technologies such as artificial intelligence (AI), machine learning (ML), deep learning, etc., aided with programming languages have been a common practice in recent years. These have been reflected in the form of several reports published during the last few decades, which have considered text mining for providing systematic reviews, scientometric analysis, extraction of various bioconcepts and constructing knowledge graphs through scientific relations, etc. These have been reflected in the form of several reports published during the last few decades, which have considered text mining for providing systematic reviews, scientometric analysis, extraction of various bioconcepts and constructing knowledge graphs through scientific relations, etc. [1,2,3,4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call