Abstract

ABSTRACT Y-rank can present faults when dealing with non-linear problems. A methodology is proposed to improve the selection of data in situations where y-rank is fragile. The proposed alternative, called k-rank, consists of splitting the data set into clusters using the k-means algorithm, and then apply y-rank to the generated clusters. Models were calibrated and tested with subsets split by y-rank and k-rank. For the Heating Tank case study, in 59% of the simulations, models calibrated with k-rank subsets achieved better results. For the Propylene / Propane Separation Unit case, when dealing with a small number of sample points, the y-rank models had errors almost three times higher than the k-rank models for the test subset, meaning that the fitted model could not deal properly with new unseen data. The proposed methodology was successful in splitting the data, especially in cases with a limited amount of samples.

Highlights

  • The explosion of data is a reality in all scientific areas

  • Some observations and forecasts can already be made based on the selections of the data

  • By excluding the first left point from training, it requires the model to extrapolate when testing with those two data, which is not recommended for empirical models

Read more

Summary

Introduction

Oil processing plants (Chandra Srivastava, 2012; Baliño, 2014), chemometrics studies (Ranzan et al, 2014), and process control strategies (Storkaas and Skogestad, 2007; Chi et al, 2014; Boullosa et al, 2017) can accumulate so much raw data that it is difficult to analyze it all to extract useful information. There are many applications for these techniques, such as in bioinformatics, where large genome datasets need to be analyzed for detecting diseases and for new drug development, or in economics, where the analysis of large market datasets can help improve planning and decision-making strategies (Kramer, 2016; Massaron and Boschetti, 2016). Learning from data means that new knowledge is extracted from a large amount of information.

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.