Abstract

Techniques of data compression involve de-duplication of data that plays an important role in eliminating duplicate copies of information and has been widely employed in cloud storage to scale back the storage capacity and save information measure. A secure AES encryption de-duplication system for finding duplication with the meaning and store up it in the cloud. To protect the privacy of sensitive information whereas supporting de-duplication, The AES encryption technique and SHA-256 hashing algorithm have been utilized to encrypt the information before outsourcing. Pre-processing is completed and documents are compared and verified with the use of wordnet. Cosine similarity is employed to see the similarity between both the documents and to perform this, a far economical VSM data structure is used. Wordnet hierarchical corpus is used to see syntax and semantics so that the identification of duplicates is done. NLTK provides a large vary of libraries and programs for symbolic and statistical natural language process (NLP) for the Python programming language that is used here for the unidentified words by cosine similarity. Within the previous strategies, cloud storage was used abundantly since similar files were allowed to store. By implementing our system, space for storing is reduced up to 85%. Since AES and SHA-256 are employed, it provides high security and efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call