Abstract

Aiming at the problems such as long time and occasional errors in the generation process of the current laboratory test report, we present an automatic capture technology of general original experimental records based on fence factor. First, the read files of the day are accurately filtered by calculating the overall Hash value of file. Then, we use the improved content-defined chunking (CDC) algorithm for chunking. The improvement of CDC algorithm includes setting the unit of the sliding window as the spacing of between two lines and setting the range of the byte size in the sliding window. When the text block is completed, a string matching algorithm based on pattern string is used to complete the matching process. The string matching algorithm constructs the mapping relationship between the pattern string and data block in data block index table, and then quickly matches the pattern string <italic>P<sub>n</sub></italic> to corresponding data block through the data block index table. The original experimental record files of customs laboratory are used for testing. The algorithm occupies the least memory and has the largest chunking throughput.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call