Combining Deep Reinforcement Learning and meta-heuristic techniques represents a new research direction for enhancing the search capabilities of meta-heuristic methods in the context of production scheduling. Q-learning is a prominent reinforcement learning in which its utilization aims to direct the selection of actions, thus preventing the necessity for a random exploration in the iterative process of the metaheuristics. In this study, we provide Q-learning guided algorithms for the Bi-Criteria No-Wait Flowshop Scheduling Problem (NWFSP). The problem is treated as a bi-criteria combinatorial optimization problem where total flow time and makespan are optimized simultaneously. Firstly, a deterministic mixed-integer linear programming (MILP) model is provided. Then, Q-learning guided algorithms are developed: Bi-Criteria Iterated Greedy Algorithm with Q-Learning (BC-IGQL). Bi-Criteria Block Insertion Heuristic Algorithm with Q-Learning (BC-BIHQL). Moreover, the performance of the proposed Q-learning guided algorithms is compared over a collection of Bi-Criteria Genetic Local Search Algorithms (BC-GLS), Bi-Criteria Iterated Greedy Algorithm (BC-IG), Bi-Criteria Iterated Greedy Algorithm with a Local Search (BC-IGALL) and Bi-Criteria Variable Block Insertion Heuristic Algorithm (BC-VBIH). The complete computational experiment, performed on 480 problem instances of Vallada et al. (2015), which is known as the VRF benchmark set, indicates that the BC-BIHQL and the BC-IGQL algorithms outperform the BC-GLS, BC-IG, BC-IGALL, and BC-VBIH algorithms in comparative performance metrics. More specifically, the proposed BC-BIHQL and BC-IGQL algorithms can yield more non-dominated bi-criteria solutions with the most substantial competitiveness than the remaining algorithms. At the same time, both are competitive with each other on the benchmark problems. Moreover, the BC-IGQL algorithm dominates almost 97% and 99% of the solutions reached by the BC-IG, BC-IGALL, and BC-VBIH algorithms in small and large datasets. Similarly, The BC-BIHQL algorithm dominates almost 98% and 99% of the solutions reached by the BC-IG, BC-IGALL, and BC-VBIH algorithms in small and large datasets, respectively. This means that, among all the features that have been compared, the Q-learning-guided algorithms demonstrate the highest level of competitiveness. The outcomes of this study encourage us to discover many more bi-criteria NWFSPs to reveal the trade-off between other conflicting objectives, such as makespan & the number of early jobs, to overcome various industries' problems.