Abstract

To effectively extract and classify the information from reports or documents and protect the privacy of the extracted results, we propose a privacy classification named Word Embedding Combination Privacy-preserving Support Vector Machine (WECPPSVM) model to classify the text. In addition, this paper also proposes the Privacy-preserving Distribution and Independent Frequent Subsequence Extraction Algorithm (PPDIFSEA), which calculates the degree of independence of the training data input to the classification model by training the Deep Belief Network(DBN) in PPDIFSEA, then obtains the Privacy Boundary(PB). PB is an indispensable condition for both data sampling and privacy noise generation. And this model can protect privacy by injecting the privacy noise into the classification result, this method can interfere with the background knowledge-based privacy attack. Our quantitative analysis shows that the WECPPSVM proposed in this paper can approach mainstream text classification algorithms in terms of text classification accuracy while preserving privacy without increasing computational complexity. In addition, the fusion study and privacy threat evaluation also verify that the proposed PPDIFSEA method combined with WECPPSVM achieves an acceptable level of classification accuracy and privacy protection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call