Abstract

Rapid advancements in technology coupled with drastic reduction in cost of storage have resulted in tremendous increase in the volumes of stored data. As a consequence, analysts find it hard to cope with the rates of data arrival and the volume of data, despite the availability of many automated tools. In a digital investigation context where it is necessary to obtain information that led to a security breach and corroborate them is the contemporary challenge. Traditional techniques that rely on keyword based search fall short of interpreting data relationships and causality that is inherent to the artifacts, present across one or more sources of information. The problem of handling very large volumes of data, and discovering the associations among the data, emerges as an important contemporary challenge. The work reported in this paper is based on the use of metadata associations and eliciting the inherent relationships. We study the metadata associations methodology and introduce the algorithms to group artifacts. We establish that grouping artifacts based on metadata can provide a volume reduction of at least $$ {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {2M}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${2M}$}} $$ , even on a single source, where M is the largest number of metadata associated with an artifact in that source. The value of M is independent of inherently available metadata on any given source. As one understands the underlying data better, one can further refine the value of M iteratively thereby enhancing the volume reduction capabilities. We also establish that such reduction in volume is independent of the distribution of metadata associations across artifacts in any given source. We systematically develop the algorithms necessary to group artifacts on an arbitrary collection of sources and study the complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call