Reinforcement learning traditionally plays a pivotal role in artificial intelligence and various practical applications, focusing on the interaction between an agent and its environment. Within this broad field, the multi-armed bandit (MAB) problem represents a specific subset, characterized by a sequential interaction between a learner and an environment where the agents actions do not alter the environment or reward distributions. MABs are prevalent in recommendation systems and advertising and are increasingly applied in sectors like agriculture and adaptive clinical trials. The stochastic stationary bandit problem, a fundamental category of MAB, is the primary focus of this article. Here, we delve into the implementation and analytical comparison of several key bandit algorithmsincluding Explore-then-Commit (ETC), Upper Confidence Bound (UCB), Thompson Sampling (TS), Epsilon-Greedy (-Greedy), SoftMax, and Conservative Lower Confidence Bound (CON-LCB)across various datasets. These datasets vary in the number of options (arms), reward distributions, and specific parameters, offering a broad testing ground. Additionally, this work provides an overview of the diverse applications of bandit problems across different fields, highlighting their versatility and broad impact.