App reviews provide a rich source of feature-related information that can support requirement engineering activities. Analyzing them manually to find this information, however, is challenging due to their large quantity and noisy nature. To overcome the problem, automated approaches have been proposed for ‘feature-specific analysis’. Unfortunately, the effectiveness of these approaches has been evaluated using different methods and datasets. Replicating these studies to confirm their results and to provide benchmarks of different approaches is a challenging problem. We address the problem by extending previous evaluations and performing a comparison of these approaches. In this paper, we present two empirical studies. In the first study, we evaluate opinion mining approaches; the approaches extract features discussed in app reviews and identify their associated sentiments. In the second study, we evaluate approaches searching for feature-related reviews. The approaches search for users’ feedback pertinent to a particular feature. The results of both studies show these approaches achieve lower effectiveness than reported originally, and raise an important question about their practical use.