Abstract

With the rapid development of IOT technology, the requirement of effective and accurate retrieval of domain knowledge is growing. Automatically extract various information of expert from the massive web pages and generate a dynamic and wholeness profile model are important for knowledge base. However, the obvious differences in structure and content semantics of web pages between any two websites shows traditional web crawler are hard to understand the semantic of the web page and extract the critical information of expert. Therefore, a six-dimension expert profile model was introduced and then a sequence tagging method with LSTM-CRF model was proposed to automatically extract rich semantic information basing on organization structure, meaning of words and attributes of experts. The results of the experiment on test data sets illustrated that the precision rate and recall rate about the job experience and research field of experts are 67.8%, 66.6% and 82.4%, 79.6%, respectively. In addition, the overall average F value about some obvious features of expert, such as name, title, email, achievement, etc., reaches 82.5%, which is better than the results by MEMM and LSTM algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call