Abstract
The political campaigns in Brazilian elections are mostly financed by public money. Every candidate has to provide detailed accountability reports to the legal authorities, which must be analyzed in a short time frame in search of eventual fraud or suspicious transactions. In this work we have compiled a real data set from 2016 Brazilian elections for all city councils in the São Paulo state and used it to propose a framework of data segmentation analysis and validation. An exploratory data analysis is performed to determine the features distribution and to define the required feature pre-processing tasks. A clustering analysis using DBSCAN method is applied to a subset of the original data, focused on segmenting the spending data regarding contracts with car fuel providers and detecting potential outliers. Three clusters were identified and a ridge regression model was used to evaluate the most important features on cluster definition. One cluster was related to candidates that received zero votes and the remaining two discriminated suppliers if they had or not contracts almost exclusively related to candidate spending on car fuel. The hyperparameters from the clustering analysis were validated using a bootstrap method and a null hypothesis of data set structure randomness was rejected using a Monte Carlo approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.