The Amazon Alexa marketplace has grown rapidly in recent years due to third-party developers creating large amounts of content and publishing directly to a skills store. Despite the growth of the Amazon Alexa skills store, there have been several reported security and usability concerns, which may not be identified during the vetting phase. However, user reviews can offer valuable insights into the security & privacy, quality, and usability of the skills. To better understand the effects of these problematic skills on end-users, we introduce ReviewTracker, a tool capable of discerning and classifying semantically negative user reviews to identify likely malicious, policy violating, or malfunctioning behavior on Alexa skills. ReviewTracker employs a pre-trained FastText classifier to identify different undesired skill behaviors. We collected over 700,000 user reviews spanning 6 years with more than 200,000 negative sentiment reviews. ReviewTracker was able to identify 17,820 reviews reporting violations related to Alexa policy requirements across 2,813 skills, and 131,855 reviews highlighting different types of user frustrations associated with 9,294 skills. In addition, we developed a dynamic skill testing framework using ChatGPT to conduct two distinct types of tests on Alexa skills: one using a software-based simulation for interaction to explore the actual behaviors of skills and another through actual voice commands to understand the potential factors causing discrepancies between intended skill functionalities and user experiences. Based on the number of the undesired skill behavior reviews, we tested the top identified problematic skills and detected more than 228 skills violating at least one policy requirement. Our results demonstrate that user reviews could serve as a valuable means to identify undesired skill behaviors.