Abstract
Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems are being trained with user-generated data, where categorical data is used as labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be unable to provide correct labels to some data or the labels might contain significant human bias. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to error and bias. We name this a subjective class problem and propose a Normalized Feature Indicative Score that can be effective in detecting subjective classes in a dataset without querying oracle. This score provides early detection of subjective classes in the data, saving time for data mining practitioners working with data that might contain errors and biases.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.