According to the CHAOS report from Standish Group during 1992–2017, the degree of success of projects in the development of software intensive systems (Software Intensive Systems, SIS) has changed insignificantly, remaining at the level of 50% inconsistency with the initial requirements (finance, time and functionality) for medium-sized projects. The annual financial losses in the world due to the total failures are of the order of hundreds of billion dollars. The majority of information about software projects has textual representation. Analysis of this information is vital for project status understanding, revealing problems on the early stage. Nowadays the majority of tasks in NLP field are solved by means of neural network language models. These models already have shown state-of-the-art results in classification, translation, named entity recognition, and so on. Pre-trained models are accessible in the internet, but the real life problem domain could differ from the origin domain where the network was learned. In this paper an approach to vocabulary expansion for neural network language model by means of hierarchical clustering is presented. This technique allows one to adopt pre-trained language model to a different domain.

Full Text

Published Version
Open DOI Link

Get access to 115M+ research papers

Discover from 40M+ Open access, 2M+ Pre-prints, 9.5M Topics and 32K+ Journals.

Sign Up Now! It's FREE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call