Abstract

This article discusses the use and tools of the spaCy module, which is written in Python machine language, in the Natural Language Processing (NLP), considered as one of the main areas of computer linguistics. A text in a natural language contains separate units (symbols) and can be divided into several interrelated parts belonging to different levels. The article, therefore, presents methods for tokenizing text using the spaCy library tools as well as the lemma, POS, tag, dep, shape, alpha, and stop attributes generated in a pipeline process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call