Abstract
In recent years, with the development of information technology, the Internet has become an essential tool for human daily life. However, as the popularity and scale of the Internet continue to expand, malware has also emerged as an increasingly widespread trend, and its development has brought many negative impacts to the society. As the number of types of malware is getting enormous, the attacks are constantly updated, and at the same time, the spread is very fast, causing more and more damage to the network, the requirements and standards for malware detection are constantly rising. How to effectively detect malware is a research trend; in order to tackle the new needs and problems arising from the development of malware, this paper proposes to guide machine learning algorithms to implement malware detection in a distributed environment: firstly, each detection node in the distributed network performs anomaly detection on the captured software information and data, then performs feature analysis to discover unknown malware and obtain its samples, updates the new malware features to all feature detection nodes in the whole distributed network, and trains the random forest-based machine learning algorithm for malware classification and detection, thus completing the global response processing capability for malware. By building a distributed system framework, the global capture capability of malware detection is enhanced to robustly respond to the increasing and rapid spread of malware, and machine learning algorithms are integrated into it to achieve effective detection of malware. Extended experiments on the Ember 2017 and Ember 2018 databases show that our proposed approach achieves advanced performance and effectively addresses the problem of malware detection.
Highlights
In recent years, with the development of information technology, the Internet has become an essential tool for human daily life
How to effectively detect malware is a research trend; in order to tackle the new needs and problems arising from the development of malware, this paper proposes to guide machine learning algorithms to implement malware detection in a distributed environment: firstly, each detection node in the distributed network performs anomaly detection on the captured software information and data, performs feature analysis to discover unknown malware and obtain its samples, updates the new malware features to all feature detection nodes in the whole distributed network, and trains the random forest-based machine learning algorithm for malware classification and detection, completing the global response processing capability for malware
For the subnodes in the distributed system, we describe in detail their algorithms for performing feature extraction and random forest-based malware detection
Summary
Earlier detection of computer malware was done on the host computer in a completely isolated and controlled environment that did not require collaboration. In the face of the growing demand for big data, systematic research on the mining architecture of big data and its core mining models and algorithms under the related architecture becomes a problem that must be faced. Amer and Zelinka [14] designed the statistical information of microcluster-like data into a tree structure that grows with time to maintain it Both of these works are oriented to single data stream mining. Based on the above two points, considering the high predictive capability and better robust performance of the integrated learning technique, this paper will study the malware detection methods suitable for the distributed approach by drawing on the existing integrated learning technique
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.