Abstract

In agriculture, crop yield prediction is critical. Crop yield depends on various features including geographic, climate and biological. This research article discusses five Feature Selection (FS) algorithms namely Sequential Forward FS, Sequential Backward Elimination FS, Correlation based FS, Random Forest Variable Importance and the Variance Inflation Factor algorithm for feature selection. Data used for the analysis was drawn from secondary sources of the Tamil Nadu state Agriculture Department for a period of 30 years. 75% of data was used for training and 25% data was used for testing. The performance of the feature selection algorithms are evaluated by Multiple Linear Regression. RMSE, MAE, R and RRMSE metrics are calculated for the feature selection algorithms. The adjusted R2 was used to find the optimum feature subset. Also, the time complexity of the algorithms was considered for the computation. The selected features are applied to Multilinear regression, Artificial Neural Network and M5Prime. MLR gives 85% of accuracy by using the features which are selected by SFFS algorithm.

Highlights

  • INTRODUCTION & RELATED WORKData mining is a process of discovering previously unknown and potentially interesting patterns in large datasets (Frawley et al, 1991)

  • Our study investigates the behaviour of five feature selection algorithms with sixteen features and the outcome is given as input to multiple linear regression model, artifical neural network and M5Prime to find the accuracy

  • The Akaike Information Criterion (AIC) value is calculated by using the formula AIC = N ln SSNerror + 2K, here N is the number of observation and K is the number of paramter +1

Read more

Summary

INTRODUCTION & RELATED WORK

Data mining is a process of discovering previously unknown and potentially interesting patterns in large datasets (Frawley et al, 1991). Feature selection optimizes the performance of the data mining algorithm and makes it easier for the analyst to interpret the outcome of the modeling. This procedure can reduce the cost of recognition by reducing the number of features to be collected, but in some cases it can provide a better classification of prediction accuracy due to finite sample size effects High ranked feature greater than a threshold values was selected They evaluated their system using knowledge discovery data dataset and Naïve Bayes algorithm. The aim of this research work is to identify important paddy field conditions (features) using feature selection algorithms for providing a comprehensive view about paddy crop yield. Our study investigates the behaviour of five feature selection algorithms with sixteen features and the outcome is given as input to multiple linear regression model, artifical neural network and M5Prime to find the accuracy

DATA SOURCE
DATA PRE-PROCESSING
Feature Selection and Evaluation
Sequencial Forward Feature Selection Algorithm
Sequencial Backward Elimination Feature Selection Algorithm
Correlation Based Feature Selection Algorithm
Variance Inflation Factor
Random Forest Variable Importance
Multiple Linear Regression Model
MLR Model for Crop yield Prediction
M5 Prime
Accuracy Metrics
RESULTS AND DISCUSSIONS
Selection Procedure
Selection procedure
MODEL VALIDATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.