Abstract
Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semi-automated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing (NLP). We selected all non-education R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pre-trained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to one of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends. We included 5,874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed two dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The three topics with largest funding increase were Imaging biomarkers, Informatics, and Radiopharmaceuticals, which all had a mean annual growth of >$218,000. The three topics with largest funding decrease were Cellular stress response, Advanced imaging hardware technology, and Improving performance of breast cancer computer-aided detection (CAD), which all had a mean decrease of >$110,000. We developed a semi-automated NLP approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have