Abstract

Entity recognition is the computational task of identifying words or phrases in natural language text that correspond to real-world objects of specific predefined types and has several text processing applications. However, current entity recognition methods are trained to recognize only a relatively small set of entity types. Extending an entity recognition method to a novel entity type requires a large labeled dataset of known mentions of the new entity type. As labeling natural language datasets is a time-consuming process, identifying novel entity types remains a challenging problem. This work extends the Snowball approach to enable recognition of novel entity types from unstructured text that is typical in social media. The approach uses a set of keywords known to be associated with a new entity type and a large unlabeled corpus of text that could contain mentions of the entities. The iterative approach starts with dataset messages that are most likely to contain the entities. Likelihood is based on the number of keywords that appear in a message. This approach is then applied to the problem of identifying food entities in messages on the Twitter network. The initial set of keywords is obtained from the FoodKeeper dataset, a dataset provided by the U.S. Food Safety and Inspection Service, and which contains information on a variety of foods. The motivation for this application is to build a system that can automatically respond to messages about food with relevant information about food safety and preparation in an effort to reduce food waste. We evaluated the precision and recall of the entity recognition method on a hand-labeled dataset of tweets. The system achieved a precision of 0.80 and a recall of 0.80 (f-score of 0.80) on this dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.