Abstract
Extracting time expressions from free text is a fundamental task for many applications. We analyze the time expressions from four datasets and find that only a small group of words are used to express time information, and the words in time expressions demonstrate similar syntactic behaviour. Based on the findings, we propose a type-based approach, named SynTime, to recognize time expressions. Specifically, we define three main syntactic token types, namely time token, modifier, and numeral, to group time-related regular expressions over tokens. On the types we design general heuristic rules to recognize time expressions. In recognition, SynTime first identifies the time tokens from raw text, then searches their surroundings for modifiers and numerals to form time segments, and finally merges the time segments to time expressions. As a light-weight rule-based tagger, SynTime runs in real time, and can be easily expanded by simply adding keywords for the text of different types and of different domains. Experiment on benchmark datasets and tweets data shows that SynTime outperforms state-of-the-art methods.
Highlights
Time expression plays an important role in information retrieval and many applications in natural language processing (Alonso et al, 2011; Campos et al, 2014)
Occurrence, small vocabulary, and similar syntactic behaviour all reduce the cost of energy required to communicate
We propose a time tagger named SynTime to recognize time expressions using syntactic token types and general heuristic rules
Summary
Time expression plays an important role in information retrieval and many applications in natural language processing (Alonso et al, 2011; Campos et al, 2014). The key difference between SynTime and other rulebased taggers lies in the way of defining token types and the way of designing rules. (The test for other languages needs only to construct a collection of token regular expressions in the target language under our defined token types.) we evaluate SynTime against three state-of-the-art methods (i.e., HeidelTime, SUTime, and UWTime) on three datasets: TimeBank, WikiWars, and Tweets.. We propose a time tagger named SynTime to recognize time expressions using syntactic token types and general heuristic rules. We conduct experiments on three datasets, and the results demonstrate the effectiveness of SynTime against state-of-the-art baselines
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.