Abstract

This paper presents a feature selection approach for named entity recognition using genetic algorithm. Different aspects of genetic algorithm including computational time and criteria for evaluating an individual (i.e., size of the feature subset and the classifier's accuracy) are analyzed in order to optimize its learning process. Two machine learning algorithms, k-Nearest Neighbor and Conditional Random Fields, are used to calculate the accuracy of the named entity recognition system. To evaluate the effectiveness of our genetic algorithm, feature subsets returning by our proposed genetic algorithm are compared to feature subsets returning by a hill climbing algorithm and a backward one. Experimental results show that feature subsets obtained by our genetic algorithm is much smaller than the original feature set without losing of predictive accuracy. Furthermore, these feature subsets result in higher classifier's accuracies than that of the hill climbing algorithm and the backward one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.