Abstract
Exchange of enormous data and information securely and frequently via Internet is very common and demanded in today‟s fast track scenario of world. The idea behind the proposed Sensitive Information Security Model Based on Term Clustering (SIS-TC) is to provide the security to a large volume of text documents which contain very important and sensitive information or data or both. These documents are first broken into its constituent parts, called terms, by using knowledge repository and then term clusters are made by finding out the similar terms of each category. These clusters represent the categories of Noun, Pronoun, Numeral, Punctuation etc. Only one instance of a cluster is kept and become the cluster representative. Firstly, the term frequency of each different occurred term (or word) is calculated and then all the duplicate copies of each term are removed, so that to transform it into the low dimensional data. Such reduced data set drastically decreases the total size of the complete data and space as well, and increases the performance of the system by the ratio of 65% -70%. Next, this reduced data is divided into High Risk Data (HRD) and Low Risk Data (LRD) to provide different level of security to each type. Therefore, HRD is symmetrically encrypted whereas LRD is encrypted non-symmetrically. This paper also includes the analytical experimental results based on the test data set of 8 text documents of varying sizes. General Terms Term clustering; knowledge repository; sensitive information; high dimensional and low dimensional data; symmetric and non-symmetric encryption.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.