Abstract

Text mining can deal with unstructured information. The proposed work extricates content from a PDF report is changed over to plain content configuration; at that point record is tokenized and serialized. Record grouping and classification is finished by discovering similarities between reports put away in cloud. Comparable archives are distinguished utilizing Singular Value Decomposition (SVD) strategy in Latent Semantic Indexing (LSI). At that point comparative records are assembled together as a group. A similar report is done between LFS (Local File System) and HDFS (HADOOP DISTRIBUTED FILE SYSTEM) as for rate and dimensionality. The System has been assessed on genuine records and the outcomes are classified.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.