Abstract

To ensure the cyber security of an enterprise, a SIEM (Security Information and Event Management) system is in place to flag alerts and assign each of them a severity score based on some pre-determined rules. Analysts in the security operations center investigate the high severity alerts to decide if those alerts are truly malicious or not. However, generally the number of alerts is overwhelmingly large, far exceeding the SOC's capacity to handle them, and the majority of them are false positive. There is a great need for a machine learning system to accurately detect the risky hosts. Traditional supervised learning algorithms cannot be directly applied to this problem as very few risky hosts (positive labels) are identified and the positive labels are biased because the SOC analysts only investigate high severity alerts. In this paper, we propose a new distance-based PU learning approach, in which we use four different distances to measure similarity to the positive labels and a Gaussian Copula function to capture their correlation structure and ensemble four different distance measures into one joint probability density that we can directly use to infer new labels. The new approach has the advantage of significantly reducing the bias of the inferred labels while traditional supervised PU learning increases bias. To quantify bias, we also propose a new bias estimate method. We apply the new bias-reduction Positive Unlabeled (PU) learning system to detect host risk in cyber security. Results on real enterprise data indicate that the proposed PU learning is able to detect risky hosts effectively while at the same time greatly reducing the label bias. t-SNE 2-dimensional visualization also demonstrates that the labels from distance-based PU learning are more evenly distributed with higher Kozachenko-Leonenko entropy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.