Abstract

The article is devoted to the analysis of the use of machine learning algorithms to detect attacks using a custom web environment or the functionality of user applications. Learning with a teacher and clustering algorithms are considered. The dataset uses a sample of online shopping transactions collected by an e-commerce retailer. The dataset contains 39,221 transactions. To detect attacks in the web environment, the most optimal implementations of machine learning algorithms were selected after their review and comparative analysis. The most effective algorithm for detecting fraudulent transactions has been determined. We use the accuracy and running time of the algorithm as criteria. The accuracy of detecting fraudulent transactions for Random Forest, GB (Scikit-learn), GB (CatBoost) algorithms is 100%, and the KD-trees algorithm is 99,9%. The gradient boosting algorithm in the CatBoos implementation is 4,2 times faster than Random Forest, 2,4 times faster than GB Scikit-learn, 1,2 times faster than GB without using the cat_features parameter, 41,9 times faster than k-dimensional trees, 66,8 times faster than DBSCAN. The data obtained for each method is presented in the form of tables. Within the framework of this work, the parameters for evaluating the effectiveness of the algorithms under study are learning time indicators, as well as characteristics from the Confusion matrix and Classification Report for classification algorithms, and fowlkes_mallows_score, rand_score, adjusted_rand_score, Homogeneity, Completeness, V-measure for clustering algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.