Abstract

This paper proposes a hierarchical organization framework model of heterogeneous data integration and resource utilization of digital library. The model is divided into metadata layer, ontology layer, link data layer and application layer from top to bottom. Aiming at the drawbacks of the traditional document clustering methods, an improved LDA method is proposed to cluster the shared subject documents of the same category. The results show that the retrieval time of the proposed linked data organization framework model of digital library is much smaller than that of the traditional model, and the search response time is reduced by 56.3%. The establishment of the linked data organization framework model of digital library can greatly increase the interconnection of distributed heterogeneous information and make the data source more easily crawled by search engines. TC-LDA algorithm has the highest retrieval accuracy and the best stability. The time consumption of the five algorithms is almost the same, thus it hard to get the algorithm with advantage of time-consuming. The improved LDA text subject clustering algorithm has more advantages than the traditional text clustering method, which is obtained from the analysis of clustering accuracy and time consuming.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call