Abstract

This chapter outlines common public safety and security challenges to provide an overview of additional work in data mining. It discusses several topics such as intrusion detection, identity theft, syndromic surveillance, data collection, fusion and preprocessing, text mining and fraud detection. Different approaches to training and validating models exist, however, which use slightly different partitioning techniques. This chapter illustrates a three-sample approach that includes training, validation and test. Additional approaches to data partitioning include the use of different percentages of data to the training and test samples. This approach to data partitioning can be particularly useful when modeling infrequent or rare events, as it results in an increased number of cases of interest from which to create the model without over representing unusual or spurious findings, which is a limitation with boosting methods. Boosting methods can be used to address extremely small sample sizes or infrequent events. These methods confer additional weight or emphasis to infrequent or underreported events.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.