Classification Based on Rules for the Study of Cotton Productivity in the State of Mato Grosso

Alexandra Virgínia Valente Da Silva,Carlos Manoel Pedro Vaz,Rafael Galbieri,Ednaldo José Ferreira

doi:10.22456/2175-2745.108126

Abstract

The advance of cotton farming in the Brazilian savannah boosted and made possible a highly technified, efficient and profitable production, elevating the country from the condition of cotton fiber importer in the 70s to one of the main exporters so far. Despite the increasing contribution of technologies such as transgenic cultivars, machines, inputs and more efficient data management, in recent years there has been a stagnation of cotton productivity in the State of Mato Grosso (MT). Data Mining (MD) techniques offer an excellent opportunity to assess this problem. Through the rules-based classification applied to a real database (BD) of cotton production in MT, factors were identified that were affecting and consequently limiting the increase in productivity. In the pre-processing of the data, we perform the attributes, selection, transformation and identification of outliers. Numerical attributes were discretized using automatic techniques: Kononenko (KO), Better Encoding (BE) and combination: KO + BE. In modeling the rule algorithms used were PART and JRip, both implemented in the WEKA tool. Performance was assessed using statistical metrics: accuracy, recall, cost and their combination using the I_FC index (created by the authors). Results showed better performance for the PART classifier, with discretization by the KO + BE technique, followed by binary conversion. The analysis of the rules made it possible to identify the attributes that most impact productivity. This article is an excerpt from an ICMC/USP Professional Master's Dissertation in Science carried out in São Carlos-SP/BR.

Full Text