Filtering high-variability and high-severity defect reports from large test report databases is a challenging task in crowdtesting. Traditional optimization algorithms based on clustering and distance techniques have made progress but are limited by initial parameter settings and significantly decrease in efficiency with an increasing number of reports. To address this issue, this paper proposes a method that integrates reinforcement learning with genetic algorithms for crowdsourced testing report optimization, called Reinforcement Learning-based Genetic Algorithm for Crowdsourced Testing Report Optimization (RLGA). Its core goal is to identify distinct, high-severity defect reports from a large set. The method uses genetic algorithms to generate the optimal report selection sequence and adjusts the crossover probability (Pc) and mutation probability (Pm) dynamically with reinforcement learning based on the population’s average fitness, best fitness, and diversity. The reinforcement learning component uses a hybrid SARSA and Q-Learning strategy to update the Q-value table, allowing the algorithm to learn quickly in early iterations and expand the search space later to avoid local optima, thereby improving efficiency. To validate the RLGA method, this paper uses four public datasets and compares RLGA with six classic methods. The results indicate that RLGA outperforms BDDIV in terms of execution time and is less sensitive to the total number of test reports. In terms of optimization objectives, the test reports selected by RLGA have higher levels of defect severity and diversity than those selected by the random choice, BDDIV, and TSE methods. Regarding population diversity, RLGA effectively enhances the uniformity and diversity of individuals compared to random initialization. In terms of convergence speed, RLGA is superior to the GA, GA-SARSA, and GA-Q methods.