Abstract

Abstract In recent years, the just-in-time (JIT) predictive models have attracted considerable attention due to their ability to prevent degradation of prediction accuracy. However, one of their practical limitations is expensive computation, which becomes a major factor that prevents them from being used for big data quality prediction. This is because the JIT modeling methods need to update the local regression model using the relevant samples that are searched through the lineal scan of the database during online operation. To solve this issue, the present work proposes a novel hashing-based JIT (HbJIT) modeling method that is suitable for big data quality prediction. In HbJIT, a family of locality-sensitive hash functions is firstly used to hash big data into a set of buckets, in which similar samples are grouped on themselves. During online prediction, HbJIT looks up multiple buckets that have a high probability of containing similar samples of a query object through the intelligent probing scheme, uses the data objects in the buckets as the candidate set of the results, and then filters the candidate objects using a linear scan. After filtering, the most relevant samples are used to construct the local regression model to yield the prediction of the query object. By integrating the multi-probe hashing strategy into the JIT learning framework, HbJIT can not only deal with process nonlinearity and time-varying characteristics but also is applicable to large-scale industrial processes. Experimental results on real-world dataset have demonstrated that the proposed HbJIT is time-efficient in processing large-scale datasets, and greatly reduces the online prediction time without compromising on the prediction accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.