Abstract

This work-in-progress paper focuses on a filtering technique based on user preferences. It uses parallel processing and machine learning to effectively filter out user preferred data from a large raw data set. Although large volumes of data are generated, a user is often interested in only a select type (classes) of such data. The motivation behind this research is to devise an effective and efficient filtering technique for extracting user preferred data from large data sets. Storing only filtered data and discarding the remaining data can decrease latency in searching for specific information within a data set. It can also decrease the size of the storage required for storing these data. Such a filtering method that uses data classification techniques can give rise to high processing latencies. An algorithm and system that use both parallel processing and machine learning are presented. A proof-of-concept prototype is built on the Apache Spark parallel processing platform. Analysis of the results of preliminary experiments demonstrates the viability of the investigated technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.