Abstract

According to the short text and unstructured characteristics of customer address, a data association fusion method for address has been proposed. In this method, the address was mapped to a digital fingerprint by improved Simhash technology, which effectively reduced the dimension of massive addresses and simplified the similarity-matching process of multi-source heterogeneous addresses. Furthermore, the weight setting of the eigenvector of the simhash algorithm was improved by introducing special weight gain. A two-level index mechanism was established by the characteristics of address division and data structure of digital fingerprints; the time-consuming digital fingerprint comparison was greatly reduced. The experimental results showed that calculation efficiency was greatly optimized; accuracy and coverage of the comparison were ensured. Through address matching of different databases, information fusion can be completed and the goal which power customers' demands is connected to power grid equipment is achieved.

Highlights

  • With the deepening of electric power reform, grid enterprises have gradually begun to establish a modern customer-centered service mode in recent years

  • Large grid enterprises, which set up big data organizations and big data platforms, have taken the initiative to carry out digital transformation (Sun, 2019) and promote the integration of grid business and customer electricity behavior information rapidly (Teeraratkul et al, 2018; Wang et al, 2019; Li et al, 2021)

  • State Grid Corporation of China (SGCC) has constructed complete file information of customers, metering devices, and power grid equipment, which can be linked through a unique power consumption number

Read more

Summary

INTRODUCTION

With the deepening of electric power reform, grid enterprises have gradually begun to establish a modern customer-centered service mode in recent years. Data Associations of Address Matching fusion with power grid data, it is necessary to make pair-wise comparison of all entities between the customer information database and the power marketing database. The amount of customer information is massive, and the calculation complexity, which achieves data alignment of similar address entities among different databases, is high (Shen and Feng, 2018; Kang et al, 2019). The operation cost was used as the similarity function, and the difference between matched texts could be better quantified This method was not sensitive to local missing characters. Compared with matching the text entity, the comparison of a fingerprint could greatly reduce the complexity of data storage and calculation and provide an effective choice for mass data comparison. Through the two-level partition index mechanism, the dimensionality reduction of comparison is realized, which can support the massive data matching of more than 100 million levels

Association Model Based on Address Data
Standardization of Customer Address
Principle of Simhash Algorithm
Improved Simhash Algorithm Considering Address Characteristics
Segmented Index Method Based on Improved Simhash Algorithm
Accuracy Coverage Accuracy Coverage
Sample Selection and Experimental Environment
Experimental Results
MAIN CONCLUSIONS
AUTHOR CONTRIBUTIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.