Abstract

In digital investigations the textual evidence is very important. It has a vast majority during investigations. All this is due to a very great deal of stored digital data which is linguistic in nature (e.g. human languages, programming languages). Some of the most important text-based evidences are SMS, chat logs, emails, word processing documents, spreadsheets, address books, calendar appointments and system logs. Here, the one who investigates is flooded with data and he has to spend valuable investigation time, scanning through the noisy search results and going through irrelevant search results. The current digital forensic text string search tools use matching or indexing algorithms for searching digital evidences at a physical level to locate specific text strings. All these are designed in such a way that it achieves 100% query recall (i.e. it finds all instances relating to text strings). Here the nature of the data set is given, which leads to a highly extreme incidence of hits that are totally irrelevant to investigative objectives. There is a text string search tool that fails to group or order search hits in a manner that considerably improves the investigator's ability to get to the relevant hits first or at least in a quick manner. Thus, the text mining has been taken up as a new initiative for digital forensics. This type of text mining approach will enhance the IIR (Intelligent Information Retrieval) effectiveness of digital forensic text string searching. Henceforth, the technology for text mining can be scalable up to large datasets in Gigabytes or Terabytes. Here the software that has been developed consists of text analysis, data extraction, and Meta data correlation and visualization features and Information retrieval. The main aim of the system that was developed is to analyze the various transcripts like SMS, Email and word letters, event logs and chat transcripts. There are specific keywords that are searched by the system, which are weighted by the user in compliance with domain specific analysis. The correlated data are ranked by the system and hence is displayed by the system. For the investigators to analyze further this helps it by providing user graphs and charts about the ranked data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call