The limited bandwidth of white light-emitting diode (LED) limits the achievable data rate in a visible light communication (VLC) system. A number of techniques, including multiple-input-multiple-output (MIMO) system, are investigated to increase the data rate. The high-speed optical MIMO system suffers from both spatial and temporal cross talks. The spatial cross-talk is often compensated by the MIMO decoding algorithm, while the temporal cross talk is mitigated using an equalizer. However, the LEDs have a non-linear transfer function and the performance of linear equalizers are limited. In this letter, we propose a joint spatial and temporal equalization using an artificial neural network (ANN) for an MIMO-VLC system. We demonstrate using a practical imaging/non-imaging optical MIMO link that the ANN-based joint equalization outperforms the joint equalization using a traditional decision feedback as ANN is able to compensate the non-linear transfer function as well as cross talk.