Fog-native computing is an emerging paradigm that makes it possible to build flexible and scalable Internet of Things (IoT) applications using microservice architecture at the network edge. With this paradigm, IoT applications are decomposed into multiple fine-grained microservices, strategically deployed on various fog nodes to support a wide range of IoT scenarios, such as smart cities and smart farming. Nonetheless, the performance of these IoT applications is affected by their limited effectiveness in processing offloaded IoT requests originating from multiple IoT devices. Specifically, the requested IoT services are composed of multiple dependent microservice instances collectively referred to as a service plan (SP). Each SP comprises a series of tasks designed to be executed in a predefined order, with the objective of meeting heterogeneous Quality of Service (QoS) requirements (e.g., low service delays). Different from the cloud, selecting the appropriate service plan for each IoT request can be a challenging task in dynamic fog environments due to the dependency and decentralization of microservice instances, along with the instability of network conditions and service requests (i.e., change quickly over time). To deal with this challenge, we study the microservice instances selection problem for IoT applications deployed on fog platforms and propose a learning-based approach that employs Deep Reinforcement Learning (DRL) to compute the optimal service plans. The latter optimizes the delay of application requests while effectively balancing the load among microservice instances. In our selection process, we carefully address the plan-dependency to efficiently select valid service plans for every request by introducing two distinct approaches; an action masking approach and an adaptive action mapping approach. Additionally, we propose an improved experience replay to address delayed action effects and enhance our model training efficiency. A series of experiments were conducted to assess the performance of our Microservice Instances Selection Policy (MISP) approach. The results demonstrate that our model reduces the average failure rate by up to 65% and improves load balance by up to 45% on average when compared to the baseline algorithms.