Predicting Budget from Transportation Research Grant Description: An Exploratory Analysis of Text Mining and Machine Learning Techniques

Ayush Singhal ,Kasthurirangan Gopalakrishnan ,Siddhartha Kumar Khaitan

doi:10.22115/scce.2017.49604

Abstract

Funding agencies such as the U.S. National Science Foundation (NSF), U.S. National Institutes of Health (NIH), and the Transportation Research Board (TRB) of The National Academies make their online grant databases publicly available which document a variety of information on grants that have been funded over the past few decades. In this paper, based on a quantitative analysis of the TRB’s Research In Progress (RIP) online database, we explore the feasibility of automatically estimating the appropriate funding level, given the textual description of a transportation research project. We use statistical Text Mining (TM) and Machine Learning (ML) technologies to build this model using the 14,000 or more records of the TRB’s RIP research grants big data. Several Natural Language Processing (NLP) based text representation models such as the Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and the Doc2Vec Machine Learning (ML) approach are used to vectorize the project descriptions and generate semantic vectors. Each of these representations is then used to train supervised regression models such as Random Forest (RF) regression. Out of the three latent feature generation models, we found LDA gives the least Mean Absolute Error (MAE) using 300 feature dimensions and RF regression model. However, based on the correlation coefficients, it was found that it is not very feasible to accurately predict the funding level directly from the unstructured project abstract, given the large variations in source agencies, subject areas, and funding levels. By using separate prediction models for different types of funding agencies, funding levels were better correlated with the project abstract.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting Budget from Transportation Research Grant Description: An Exploratory Analysis of Text Mining and Machine Learning Techniques

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improve topic modeling algorithms based on Twitter hashtags
Hayder M Alash ... Ghaidaa A Al-Sultany
Journal of Physics: Conference Series | VOL. 1660
Hayder M Alash, et. al.Hayder M Alash ... Ghaidaa A Al-Sultany
01 Nov 2020
Journal of Physics: Conference Series | VOL. 1660

Exploring Latent Dirichlet Allocation (LDA) in Topic Modeling: Theory, Applications, and Future Directions
Ugorji C Calistus ... Chukwudumebi V Egwu
NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES | VOL. 4
Ugorji C Calistus, et. al.Ugorji C Calistus ... Chukwudumebi V Egwu
11 Mar 2024
NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES | VOL. 4

Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning
Namcheol Jung ... Ghang Lee
Advanced Engineering Informatics | VOL. 41
Namcheol Jung, et. al.Namcheol Jung ... Ghang Lee
30 Apr 2019
Advanced Engineering Informatics | VOL. 41

Designing a Chat-bot for College Information using Information Retrieval and Automatic Text Summarization Techniques
Radha Guha
Current Chinese Computer Science | VOL. 1
Radha GuhaRadha Guha
06 May 2021
Current Chinese Computer Science | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting Budget from Transportation Research Grant Description: An Exploratory Analysis of Text Mining and Machine Learning Techniques

Abstract

Talk to us

Similar Papers