Abstract

The article investigates the effectiveness of the machine learning algorithm for the classification of Internet traffic. The RF algorithm, which works by constructing many decision trees, is considered. The efficiency of the RF algorithm in the problems of application classification in the presence and absence of background network traffic is evaluated. A laboratory network of several computers was set up to collect the data needed for analysis. One of the computers was connected to the World Wide Web and a wireless access point was set up on its base. On the same computer, all the traffic passing through it was captured using Wireshark. Various applications were running on other computers connected to the access point. Web pages were viewed using Google Chrome and Opera browsers, using Skype, video calls were made, files were downloaded using the µTorrent torrent client, the Steam digital game distribution service was used, etc. The obtained data were stored in the PCAP format. To bring the obtained data in line with the requirements of the problem, the data was pre-processed. In the experiment, a random forest was constructed and the quality of classification on a given sample was assessed. The most acceptable parameters of the algorithm were selected experimentally. It is experimentally chosen that the forest consists of 5 trees with the maximum possible depth. The algorithm is most effective for data related to DNS traffic. In addition to checking the operation of the algorithm on the test sample, which has the same class composition as the training, the assessment of its quality was also carried out in the presence of background traffic, i.e. in the test sample there were copies of classes absent in the training sample.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.