Applications of Python to evaluate environmental data science problems

Akhil Kadiyala,Ashok Kumar

doi:10.1002/ep.12786

Abstract

There is a significant convergence of interests in the research community efforts to advance the development and application of software resources (capable of handling the relevant mathematical algorithms to provide scalable information) for solving data science problems. Anaconda is one of the many open source platforms that facilitate the use of open source programming languages (R, Python) for large‐scale data processing, predictive analytics, and scientific computing. The environmental research community may choose to adapt the use of either of the R or the Python programming languages for analyzing the data science problems on the Anaconda platform. This study demonstrated the applications of using Scikit‐learn (a Python machine learning library package) on Anaconda platform for analyzing the in‐bus carbon dioxide concentrations by (i) importing the data into Spyder (Python 3.6) in Anaconda, (ii) performing an exploratory data analysis, (iii) performing dimensionality reduction through RandomForestRegressor feature selection, (iv) developing statistical regression models, and (v) generating regression decision tree models with DecisionTreeRegressor feature. The readers may adopt the methods (inclusive of the Python coding) discussed in this article to successfully address their own data science problems. © 2017 American Institute of Chemical Engineers Environ Prog, 36: 1580–1586, 2017

Full Text