Abstract

In our earlier work, we presented BOAT, a big-data open source analytics toolkit and framework, and applied it to analyze trends and outliers in public healthcare data. In this paper, we extend this framework to predict the costs of different medical procedures that patients could incur, based on open healthcare data. Specifically, we analyze de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of more than 2 million records. We investigated the three model classes consisting of multiple linear regression, regression trees and deep neural networks (DNNs). We conducted a grid-search to identify the best parameter choices. We determined that the best performance based on grid-search cross validation with the widely used R2 metric was achieved by an 8 layered DNN with size 5x5x10x25x25x10x5x5 using an Adam optimizer with learning rate of 0.01. We obtained an R2 value of 0.71 which is better than the values reported in the literature for similar problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call