We consider a horizontal and dynamic auto-scaling technique in a cloud system where virtual machines hosted on a physical node are turned on and off to minimise energy consumption while meeting performance requirements. Finding cloud management policies that adapt the system to the load is not straightforward, and we consider here that virtual machines are turned on and off depending on queue load thresholds. We want to compute the optimal threshold values that minimize consumption costs and penalty costs (when performance requirements are not met). To solve this problem, we propose several optimisation methods, based on two different mathematical approaches. The first one is based on queueing theory and uses local search heuristics coupled with the stationary distributions of Markov chains. The second approach tackles the problem using Markov Decision Process (MDP) in which we assume that the policy is of a special multi-threshold type called hysteresis. We improve the heuristics of the former approach with the aggregation of Markov chains and queues approximation techniques. We assess the benefit of threshold-aware algorithms for solving MDPs. Then we carry out theoretical analyzes of the two approaches. We also compare them numerically and we show that all of the presented MDP algorithms strongly outperform the local search heuristics. Finally, we propose a cost model for a real scenario of a cloud system to apply our optimisation algorithms and to show their practical relevance. The major scientific contribution of the article is a set of fast (almost in real time) load-based threshold computation methods that can be used by a cloud provider to optimize its financial costs.