Abstract

In fiscal year (FY) 2020, the NIH spent over $36B on grants, of which $508M went to radiation oncology and diagnostic radiology departments. Given the enormity of this expenditure, it is imperative to better understand the funding distribution of research topics to gain insight into funding appropriateness. Multiple studies have performed categorization of grants (including an ASTRO Grant Funding Portfolio Analysis in 2017) but these approaches have been limited to manual analysis of small corpora. Due to the high annual number of grants, there is a need for automatic and systematic extraction of research topics from grants to enable evaluation of trends, productivity, and geographic distribution.We analyzed 4346 R-type grants (excluding R25) awarded to Department of Radiation Oncology/Diagnostic Radiology funded by the National Cancer Institute and National Institute of Biomedical Imaging and Bioengineering from FY 2010-2020 using NIH ExPORTER. Preprocessing was done on 'Project Terms' to weight their importance using TF*IDF vectorization and principal component analysis. Spectral clustering was used to cluster the grants. The optimal cluster number was determined using scoring methods of intercluster distance. Manual validation was performed to verify cluster correctness.We found the optimal number of 12 clusters to best represent separation of the R-type grant research directions. These clusters represent clear topics such as oncogenesis, image reconstruction, and assay development. Notable trends include increased funding of radiation technology and hepatobiliary therapy clusters averaging +$1.2M and +$1.1M annual growth, respectively, over 10 years, and decreased funding of image reconstruction and MRI clusters averaging -$0.71M and -$0.67M annually. Further analysis shows that the DNA damage cluster is most geographically skewed, with 52% of the funding going to institutions in three cities (Dallas, Houston, New Haven). Prostate cancer is also heavily geographically skewed, with the most funded city (San Francisco) having triple the funds of the second highest (Baltimore). The number of published articles per grant also shows a bias, with the workshops/conferences and image reconstruction clusters having publication rates of 35.4 and 9.9, respectively. Radiation biology research is the biggest cluster and has a publication rate of 12.7, which is lower than average for NCI (16.3), and may reflect the pace of basic vs. applied research.We propose an unsupervised machine learning framework to categorize grants by area of investigation, which is more efficient and systematic than manual labeling and more holistic than keyword searching. This clustering example suggests a trend from imaging-based towards therapy-based research over the past 10 years. We also identify biases in publication rate and geographic distribution of funds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call