Abstract

With the increase of data in the network, the load of servers and communication links becomes heavier and heavier. Edge computing can alleviate this problem. Due to a sea of malicious contents in Darknet, it is of high research value to combine edge computing with content detection and analysis. Therefore, this paper illustrates an intelligent classification system based on machine learning and Scrapy that can detect and judge fleetly categories of services with malicious contents. Because of the nondisclosure and short survival time of Tor Darknet domain names, obtaining uniform resource locators (URLs) and resources of the network is challenging. In this paper, we focus on a network based on the Onion Router (tor) anonymous communication system. We designed a crawler program to obtain the contents of the Tor network and label them into six classes. We also construct a dataset which contains URLs, categories, and keywords. Edge computing is used to judge the category of websites. The accuracy of the classifier based on a machine learning algorithm is as high as 89%. The classifier will be used in an operational system which can help researchers quickly obtain malicious contents and categorize hidden services.

Highlights

  • Introduction eDarknet has a huge amount of data

  • In Tor Darknet, a domain name’s complete format is “[digest].onion,” which is made up of two parts: the first [digest] is a random string of numbers mixed with English, and the second is a uniform suffix of Tor links, jsaljfslj4sfd5ad.onion, for example

  • It will not show any results when we search sites with the suffix “.onion.” erefore, in order to classify the contents of Tor Darknet, domain names need to be obtained in various ways

Read more

Summary

Proposed Model for Tor Darknet Resource Detection in Edge Computing

Erefore, web page content should be detected at edge devices, and the original data should be processed into distinctive words that best describe the website category. In Tor Darknet, a domain name’s complete format is “[digest].onion,” which is made up of two parts: the first [digest] is a random string of numbers mixed with English, and the second is a uniform suffix of Tor links, jsaljfslj4sfd5ad.onion, for example It will not show any results when we search sites with the suffix “.onion.” erefore, in order to classify the contents of Tor Darknet, domain names need to be obtained in various ways. After the text content of each web page is cleaned, the corpus is integrated into a dataset containing URLs, categories, and key words. A machine learning algorithm, KNN, is applied to such samples for the purpose of training a classifier in subsequent experiments

Classification Model
Experimental Analysis
E10 Figure 8
Method
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.