Abstract

In recent years, with the development of information technology, the Internet has become an essential tool for human daily life. However, as the popularity and scale of the Internet continue to expand, malware has also emerged as an increasingly widespread trend, and its development has brought many negative impacts to the society. As the number of types of malware is getting enormous, the attacks are constantly updated, and at the same time, the spread is very fast, causing more and more damage to the network, the requirements and standards for malware detection are constantly rising. How to effectively detect malware is a research trend; in order to tackle the new needs and problems arising from the development of malware, this paper proposes to guide machine learning algorithms to implement malware detection in a distributed environment: firstly, each detection node in the distributed network performs anomaly detection on the captured software information and data, then performs feature analysis to discover unknown malware and obtain its samples, updates the new malware features to all feature detection nodes in the whole distributed network, and trains the random forest-based machine learning algorithm for malware classification and detection, thus completing the global response processing capability for malware. By building a distributed system framework, the global capture capability of malware detection is enhanced to robustly respond to the increasing and rapid spread of malware, and machine learning algorithms are integrated into it to achieve effective detection of malware. Extended experiments on the Ember 2017 and Ember 2018 databases show that our proposed approach achieves advanced performance and effectively addresses the problem of malware detection.

Highlights

  • In recent years, with the development of information technology, the Internet has become an essential tool for human daily life

  • How to effectively detect malware is a research trend; in order to tackle the new needs and problems arising from the development of malware, this paper proposes to guide machine learning algorithms to implement malware detection in a distributed environment: firstly, each detection node in the distributed network performs anomaly detection on the captured software information and data, performs feature analysis to discover unknown malware and obtain its samples, updates the new malware features to all feature detection nodes in the whole distributed network, and trains the random forest-based machine learning algorithm for malware classification and detection, completing the global response processing capability for malware

  • For the subnodes in the distributed system, we describe in detail their algorithms for performing feature extraction and random forest-based malware detection

Read more

Summary

Distributed Architecture

Earlier detection of computer malware was done on the host computer in a completely isolated and controlled environment that did not require collaboration. In the face of the growing demand for big data, systematic research on the mining architecture of big data and its core mining models and algorithms under the related architecture becomes a problem that must be faced. Amer and Zelinka [14] designed the statistical information of microcluster-like data into a tree structure that grows with time to maintain it Both of these works are oriented to single data stream mining. Based on the above two points, considering the high predictive capability and better robust performance of the integrated learning technique, this paper will study the malware detection methods suitable for the distributed approach by drawing on the existing integrated learning technique

Based on Machine Learning Methods
Node Detection
Distributed Topology
Analysis
Mechanisms for Collaboration
Detection Algorithm
Datasets
Experimental Setup and Evaluation Metrics
Detection Performance Comparison
Distributed System Performance Verification
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.