Abstract

AbstractIn this paper, for document classification task and text mining based on machine learning, I propose a new pool-based active learning method to select unlabeled data that have effective features not found in the training data. Given a small set of training data and a large set of unlabeled data, the active learner selects the most uncertain data that has effective features not found in the training data from the unlabeled data and asks to label it. After capturing these uncertain data from the unlabeled data repeatedly, I apply the existing pool-based active learning to select training data from the unlabeled data efficiently. Therefore, by adding data with effective features from unlabeled data to training data, I consider that it is effective to improve the performance of the pool-based active learning. To evaluate the efficiency of the proposed method, I conduct some experiments and show that my active learning method achieves consistently higher accuracy than the existing algorithm.KeywordsDocument classificationActive learningText mining

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.