AbstractWeb-based platforms offer suitable experimental environments enabling the construction and reuse of natural language processing (NLP) pipelines. However, systematic evaluation of NLP tools in an open science web-based setting is still a challenge, as suitable experimental environments for the construction and reuse of NLP pipelines are still rare. This paper presents TextFlows, an open-source web-based platform, which enables user-friendly construction, sharing, execution, and reuse of NLP pipelines. It demonstrates that TextFlows can be easily used for systematic evaluation of new NLP components by integrating seven publicly available open-source part of speech (POS) taggers from popular NLP libraries, and evaluating them on six annotated corpora. The integration of new tools into TextFlows supports tools reuse, while the use of precomposed algorithm comparison and evaluation workflows supports experiment reproducibility and testing of future algorithms in the same experimental environment. Finally, to showcase the variety of evaluation possibilities offered in the TextFlows platform, the influence of various factors, such as the training corpus length and the use of pre-trained models, have been tested.
Read full abstract