Abstract
In recent years, with the continuous development of China's science and technology and computer science and technology, the technology of transmitting various information in the form of such text as Chinese and short text has been developing and popularizing, like Micro-blog, and WeChat official account. The continuous increase in the dissemination of short text information provide various resources for information decision-making and information, but there has also been a large amount of redundancy, especially in the case of invalid and repetitive information for informational texts. In such a large and repetitive information set, the storage capacity of the system is heavily occupied, which is not conducive to the collection and extraction of effective information and data from informational texts, seriously affecting the accuracy of information decision-making and affecting the timeliness of information. Therefore, it is necessary to strengthen the research on method for informational text de-duplication in this context. Taking informational text de-duplication as an example, this paper analyzes the current research on technology for text de-duplication at home and abroad, and conducts research on methods for informational text de-duplication based on relevant technologies, in order to provide certain reference ideas for enterprises when carrying out informational text de-duplication.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have