Abstract
Thesauri for science and technology information are increasingly used in bibliometrics and scientometrics. However, the manual construction and maintenance of thesauri are costly and time consuming; thus, methods for semi-automatic construction and maintenance are being actively studied. We propose a method that expands an existing thesaurus with specified terms extracted from the abstracts of articles. Specifically, we assign the terms to certain subcategories by our novel clustering method based on information entropy for word vectors. Then, we determine the hypernyms and hyponyms based on their relations with terms in the subcategories. The word vectors are constructed from 177,000 IEEE articles archived from 2012 to 2014 in the Scopus dataset. In experiments, the terms were correctly classified into the Japan Science and Technology thesaurus with 83.3% precision and 71.4% recall. In future, we will develop a semi-automatic thesaurus maintenance system that recommends new terms in their proper relative positions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.