Abstract

Mining user identity information from emails is an important research topic in email mining. Most approaches extract an email user‟s name only from the header of an email, but there are often many name information in the body of emails, which are usually more suitable for representing the sender‟s or recipient‟s identity. This paper focuses on the problem of extracting email users‟ name aliases in the body of plain-text emails. After locating and extracting salutation and signature blocks from email bodies, we can identify the potential aliases in the salutation and signature lines, which can be directly related with the email addresses in email headers, by using named entity recognition(NER) tools. To verify and amend the potential aliases that were identified by NER tools, we propose a novel approach to extract aliases in the salutation and signature lines based on name boundary word template built on the characteristics of alias neighboring words. Results on the public subset of the Enron corpus indicate that the approaches presented in this paper can efficiently extract user‟s aliases from email bodies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call