Detecting and identifying malicious executable binaries with Data Mining methods

Dmitriy Vladimirovich Komashinskiy

doi:10.15622/sp.26.9

Abstract

The paper touches on the problem of improving vital characteristics of Data Mining - based systems responsible for detecting and identifying malicious executable binaries (malware). The common structure of learning and operating procedures for such systems is defined. The main non-functional requirements to the systems are specified on this structure's basis. The research's task is formulated as a look for a new, efficient representatin models for executable binaries. The models are to give compact, informative description vectors for such file objects. The essence of suggested approaches is expounded: the first one is focused on malware detection and based on positionally-dependent static data; the second uses dynamic low-level execution data for malware identification. The architecture of the developed system is represented as well as validation results for the developed representation models.

Highlights

The paper touches on the problem of improving vital characteristics of Data Mining - based systems responsible for detecting and identifying malicious executable binaries
It provides following advantages: support of WYSIWYG development paradigm; possibility to implement own client applications and contribute new functionality; support of main well-known learning algorithms; compatibility with traditionally popular and admittedly useful for such research tasks Waikato Environment for Knowledge Analysis (WEKA)

Summary

МЕТОДОВ DATA MINING

Исследование затрагивает проблему улучшения основных характеристик систем обнаружения и идентификации вредоносных исполняемых файлов на основе методов Data Mining. The paper touches on the problem of improving vital characteristics of Data Mining - based systems responsible for detecting and identifying malicious executable binaries (malware). Концепция применения методов Data Mining (DM) для обнаружения вредоносных программ (ВП) была сформулирована Кефартом [4] и др. Что для их построения исследователь должен определить набор сущностей, устанавливающий используемые наборы данных, средства поддержки вычислений, используемую совокупность методов выделения значимых признаков, обучения и оценивания и модель представления объектов. Анализ существующих работ показывает существенную зависимость качественных показателей систем (точность принятия решения, количество ложных срабатываний, время обучения и принятия решения) от используемых моделей представления анализируемых объектов. Это определило направленность данной работы на разработку, формализацию и анализ применимости моделей представления потенциально опасных объектов формата PE32. Ниже представлены разработанные модели представления исполняемых объектов на основе их структурных (статических) и динамических особенностей [5,6]

Статической моделью приложения является набор

Динамической моделью приложения является набор

SUMMARY