Determining Chess Game State from an Image

Georg Wölflein,Ognjen Arandjelović

doi:10.3390/jimaging7060094

Abstract

Identifying the configuration of chess pieces from an image of a chessboard is a problem in computer vision that has not yet been solved accurately. However, it is important for helping amateur chess players improve their games by facilitating automatic computer analysis without the overhead of manually entering the pieces. Current approaches are limited by the lack of large datasets and are not designed to adapt to unseen chess sets. This paper puts forth a new dataset synthesised from a 3D model that is an order of magnitude larger than existing ones. Trained on this dataset, a novel end-to-end chess recognition system is presented that combines traditional computer vision techniques with deep learning. It localises the chessboard using a RANSAC-based algorithm that computes a projective transformation of the board onto a regular grid. Using two convolutional neural networks, it then predicts an occupancy mask for the squares in the warped image and finally classifies the pieces. The described system achieves an error rate of 0.23% per square on the test set, 28 times better than the current state of the art. Further, a few-shot transfer learning approach is developed that is able to adapt the inference system to a previously unseen chess set using just two photos of the starting position, obtaining a per-square accuracy of 99.83% on images of that new chess set. The code, dataset, and trained models are made available online.

Highlights

The problem of recovering the configuration of chess pieces from an image of a physical chessboard is often referred to as chess recognition
Motivated by the cumbersome process of transferring a chess position from the board to the computer in order to facilitate computer analysis, this paper presents an end-to-end chess recognition system that outperforms all existing approaches
It correctly predicts 93.86% of the chess positions without any mistakes, and when permitting a maximum of one mistake per board, its accuracy lies at 99.71%

Summary

Introduction

The problem of recovering the configuration of chess pieces from an image of a physical chessboard is often referred to as chess recognition. A compelling application arises in amateur chess, where a casual over-the-board game may reach an interesting position that the players may afterwards want to analyse on a computer. A system that is able to map a photo of a chess position to a structured format compatible with chess engines, such as the widely-used Forsyth–Edwards Notation (FEN), could automate this laborious task. To this end, we put forth a new synthesised dataset [1] comprising of rendered chessboard images with different chess positions, camera angles, and lighting setups. For the latter two steps, we employ two convolutional neural networks (CNNs), but make use of traditional computer vision techniques for board localisation

Methods

Results

Conclusion