Abstract

Maps of science visualizing the structure of science help us analyze the current spread of science, technology, and innovation (ST&I). ST&I enterprises can use the maps of science as competitive technical intelligence to anticipate changes, especially those initiated in their immediate vicinity. Research laboratories and universities can understand their environmental changes and use the map for their research management. However, traditional maps based on bibliometrics, such as citation and cocitation, have difficulty in representing recently published papers and ongoing projects that have few or no references; thus, maps based on contents, i.e., text-mining, have been developed in recent years for locating research papers/projects, for example, using word and paragraph vectors. The content-based maps, however, still pose difficulty in comparing documents in different languages. Therefore, aiming to construct a bilingual (English and Japanese) content-based map of science for the analyses of ST&I information resources in different languages, this article proposes a method for creating word and paragraph vectors corresponding to bilingual textual information in the same multidimensional space. In a comparison of 11 methods for generating document vectors, we confirmed that the best method achieved 87% accuracy of the bilingual content matching based on 10$\,$000 IEEE papers. Finally, we published a map of approximately 150$\,$000 funding projects of the National Science Foundation, Japan Society for the Promotion of Science, and Japan Science and Technology agency from 2013 to 2017.

Highlights

  • S INCE price [1] proposed using scientific methods to study science in 1965, research in scientometrics has developed techniques for analyzing research activities and measuring their relationships

  • We confirmed that #6 presented the best result in total, in which technical terms in the English and Japanese documents are replaced with English descriptors in the Japan Science and Technology Agency (JST) thesaurus, and monolingual word vectors are generated from the documents; document vectors are constructed as weighted averages of the word vectors according to the word frequencies

  • In an attempt to resolve the difficulty of content-based maps to compare documents in different languages, this article presented the best method for generating document vectors in the same space from bilingual (English and Japanese) scientific documents

Read more

Summary

INTRODUCTION

S INCE price [1] proposed using scientific methods to study science in 1965, research in scientometrics has developed techniques for analyzing research activities and measuring their relationships. Aiming to construct a bilingual (English and Japanese) content-based map of science for analyses of ST&I information resources in different languages, this article proposes a method for creating word and paragraph vectors corresponding to bilingual textual information in the same multidimensional space. By improving text-mining methods using an advanced natural language processing technique, i.e., word and paragraph embedding, for measuring the similarity of bilingual textual information, we realized a bilingual content-based map of science, including 59 192 projects of the United States and 90 334 projects of Japan from 2013 to 2017. This study contributes to the improvement of the state-of-art methods for measuring the similarity of bilingual textual information in related literature and the identification of the best method from 11 methods for bilingual content comparison based on a large set of scientific resources.

RELATED WORK
GENERATING BILINGUAL DOCUMENT EMBEDDING
Generating Bilingual Word Vectors Using Mixed Corpus
Unifying Technological Words to Descriptors in a Thesaurus
Converting Sentences to Semantic Role Graphs
Generating Document Vectors
ACCURACY OF BILINGUAL DOCUMENT VECTORS
Experimental Results
Observation
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.