Abstract

Named entities enable the identification of key elements in text while sentence classification provides for a summary of the same. Sequential labeling and sentence classification tasks together enable deeper extraction of information from text. Embeddings trained over a corpus pertaining to a specific domain, tend to generate strong vector representations thereby providing for the creation of better classification models. We propose custom fastText embeddings trained on a large Indian English news corpus. These embeddings are stacked with state-of-the-art Pooled Flair embeddings to generate an f1-score of 79 on a custom FIRE English NER dataset and 93.05 f1-score on a subset of the OntoNotes 5.0 dataset. The embeddings were also used for sentence classification on 20 news categories, to generate the best multi-class accuracy of 88.1%. We also propose two Indian news datasets, one based on the FIRE NER dataset and a custom multi-class sentence classification dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.