Abstract
► We developed a three-stage non-delivery fraud prediction system for online auction sites with good true and false positives. ► The proposed system outputs a ranking and a labeling that guarantees the expected value of the false positives rate. ► We validated the system using a huge dataset containing only publicly available data. ► We managed to predict fraud even when done by new sellers. ► We empirically confirmed that category-level features improve fraud detection. Non-delivery fraud is a recurring problem at online auction sites: false sellers that list nonexistent products just to receive payments and afterwards disappear, possibly repeating the swindle with another identity. In our work we identified a set of publicly available features related to listings, sellers and product categories, and built a machine learning system for fraud prediction taking into account the high class imbalance of real data and the need to control the false positives rate due to commercial reasons. We tested the proposed system with data collected from a major Brazilian online auction site, obtaining good results on the identification of fraudsters before they strike, even when they had no previous historical information. We also evaluated the contribution of category-related features to fraud detection. Finally, we compared the learning algorithm used (boosted trees) with other state-of-the-art methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have