Raspberry Pi (Pi) is a versatile general-purpose embedded computing device that can be used for both machine learning (ML) and deep learning (DL) inference applications such as face detection. This study trials the use of a Pi Spark cluster for distributed inference in TensorFlow. Specifically, it investigates the performance difference between a 2-node Pi 4B Spark cluster and other systems, including a single Pi 4B and a mid-end desktop computer. Enhancements for the Pi 4B were studied and compared against the Spark cluster to identify the more effective method in increasing the Pi 4B’s DL performance. Three experiments involving DL inference, which in turn involve image classification and face detection tasks, were carried out. Results showed that enhancing the Pi 4B was faster than using a cluster as there was no significant performance difference between using the cluster and a single Pi 4B. The difference between the mid-end computer and a single Pi 4B was between 6 and 15 times in the experiments. In the meantime, enhancing the Pi 4B is the more effective approach for increasing the DL performance, and more work needs to be done for scalable distributed DL inference to eventuate.