Metamorphic testing of chess engines

Manuel Méndez,Alfredo Ibias,Manuel Núñez,Miguel Benito-Parejo

doi:10.1016/j.infsof.2023.107263

Manuel Méndez, Alfredo Ibias + Show 2 more

Open Access

https://doi.org/10.1016/j.infsof.2023.107263

Copy DOI

Abstract

Chess engines are computer programs that analyse chess positions. The goal of this analysis is to decide which player has an advantage and evaluate how big the advantage is. Using this analysis, chess engines are really powerful players who can consistently beat the best (human) players. Even though these programs are fantastic players, we cannot be sure that the code is fault free because it is very difficult to test them. In particular, we face the oracle problem: if the chess engine plays better than any potential tester, how can a tester claim that a certain evaluation is wrong or that a suggested move is not the best one? The main goal of our work is to provide a metamorphic testing tool to evaluate chess engines. In particular, we are interested in looking for inconsistent behaviours in the best publicly available chess engine, Stockfish, but we would also like to consider other chess engines. We developed a metamorphic testing solution to validate chess engines. First, we defined metamorphic relations that might reveal inconsistent behaviours. The underlying idea was that the evaluation of related positions should be the same. For example, if we consider a position and rotate all the pieces with respect to the central axis, then both positions should have the same evaluation. One of our main priorities was to have a fully automatised tool. Source inputs are obtained from available datasets while follow-up inputs are automatically computed by applying sound transformations to the source inputs with respect to the corresponding metamorphic rule. In order to assess the usefulness of our work, we applied it to analyse a dataset with more than 40,000 positions. Empirical evidence validates the usefulness of our work to analyse the best available chess engine, Stockfish. Our tool revealed non-negligible deviations from the expected behaviour in Stockfish for all the MRs. Additional experiments showed that our tool can be easily used to analyse other chess engines such as Komodo, Houdini and Gull. The experiments demonstrate the usefulness of our approach to identify issues in the latest version of the widely recognised to be the best chess engine: Stockfish (version 15, released in April 2022). Our tool is flexible and can be easily extended with metamorphic relations that can be defined in the future by either us or other users. Since all our metamorphic relations are implemented and the code is freely available, users can use them as a pattern to implement new relations.

Full Text