Abstract

Data exploratory analysis discovers data structures and patterns with all variables as a whole, but this analysis does not particularly focus on seeking associations between response variables and predictor variables. In this chapter, we will discuss how to identify and measure this response-prediction relationship, which is an essential element in intrusion detection and prevention. Even though the expression for models for association and prediction can have a broad range, in general the goals of modeling for association and prediction in network security are two-fold: (1) to identify variables that are significantly associated with the response variable and (2) to assess the robustness of these variables, if any, in predicting the response. Although the term, model, is perhaps confusing to many people, a model is just a simpli- fied representation of some aspect of the real world, whether an object or observation, or a situation or process. Models are of particular importance for network security because of the size of data and the complex relationship among variables and the desired outcomes. Statistical modeling procedures available for analyzing the response-predictor phenomenon mainly include bivariate analysis and multiple regression-based analysis. Bivariate analysis focuses on the relationship between two variables (e.g., a response and a predictor) without taking into account any impact from other predictor variables on the response variable. The multiple regression modeling approach, on the other hand, requires establishing a regression relationship between a response variable and a set of potential predictor variables, and the predictive power of each of the predictors as adjusted by others. Therefore, a variable associates with the response significantly in the bivariate analysis may no longer hold such an association in the regression analysis after adjusting from other variables. In the following sections, we will review and discuss these two main approaches in detail. For readers who would like to attain a more general knowledge on modeling associations should refer to Mandel (1964), Press & Wilson (1978), Cohen & Cohen (1983), Berry & Feldman (1985), Cox & Snell (1989), McCullagh & Nelder (1989), Agresti (1996), Ryan (1997), Long (1997), Burnham & Anderson (1998), Pampel (2000), Tabachnick & Fidell (2001), Agresti (2002), Myers, Montgomery & Vining (2002), Menard (2002), and O’Connell (2006). Comprehensive reviews on data mining and statistical learning can be found from Vapnik (1998, 1999), Hastie, Tibshirani & Friedman (2001), Bozdogan (2003).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.