Virtualization technology represented through Virtual Machines (VMs) is recognized as a key infrastructure in cloud computing. This technology is developing rapidly and cloud data centers face challenges such as Virtual Machine Placement (VMP) for energy efficiency. VMP is defined as the efficient allocation of VMs to Host Machines (HMs) to achieve various objectives such as reducing energy consumption, load balancing and avoid Service Level Agreement Violations (SLAV). In this paper, VMP is addressed using a Deep Reinforcement Learning (DRL) based strategy to determine the best mapping between VMs and HMs. We present VMP-A3C, an effective strategy to solve VMP using Asynchronous Advantage Actor-Critic (A3C) algorithm as a new DRL approach. VMP-A3C aims at load balancing in HMs without SLAV, where energy consumption is reduced as much as possible. VMP-A3C learns to dynamically consolidate VMs using migration techniques to a minimum number of HMs. We believe that there is scope for improvements in shutting down little-workload HMs through VMs migration. The effectiveness of the proposed algorithm has been evaluated from various aspects such as the deployment rate, energy consumption, SLAV, the number of shutdown HMs and the number of migrated VMs. The main difference in terms of energy consumption and the number of required HMs between VMP-A3C and the best existing state-of-the-art method is 2.54% and 7.14%, respectively.