Abstract

AbstractLabelling samples is a procedure that may result in significant delays particularly when dealing with larger datasets and/or when labelling implies prolonged analysis. In such cases a strategy that allows the construction of a reliable classifier on the basis of a minimal sized training set by labelling a minor fraction of samples can be of advantage. Support vector machines (SVMs) are ideal for such an approach because the classifier relies on only a small subset of samples, namely the support vectors, while being independent from the remaining ones that typically form the majority of the dataset. This paper describes a procedure where a SVM classifier is constructed with support vectors systematically retrieved from the pool of unlabelled samples. The procedure is termed ‘active’ because the algorithm interacts with the samples prior to their labelling rather than waiting passively for the input. The learning behaviour on simulated datasets is analysed and a practical application for the detection of hydrocarbons in soils using mass spectrometry is described. Results on simulations show that the active learning SVM performs optimally on datasets where the classes display an intermediate level of separation. On the real case study the classifier correctly assesses the membership of all samples in the original dataset by requiring for labelling around 14% of the data. Its subsequent application on a second dataset of analogous nature also provides perfect classification without further labelling, giving the same outcome as most classical techniques based on the entirely labelled original dataset. Copyright © 2004 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.