Abstract
Although Rapid progress in NLP Research has seen a swift translation to real world commercial deployment. While a number of NLP applications have emerged, failures of translating scientific progress in NLP to real-world software have also been considerable. Evaluation of NLP models is often limited to held out test set accuracy on a handful of datasets. Lack of rigorous evaluation leads to over-estimation of generalization performance of the built model. A lack of understanding of the inner workings of the model results in ‘Clever Hans’ models which fail in real world deployments. Of late there has been considerable research interest into analysis methods for NLP models, and evaluation techniques going beyond test set performance metrics. However, this area of work is still not widely disseminated through the NLP community. This tutorial aims to address this gap, by providing a detailed overview of NLP model analysis and evaluation methods, discuss their strengths and weaknesses and also point towards future research directions in this area.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.