To detect gamma rays with good spatial, timing and energy resolution while maintaining high sensitivity we need accurate and efficient algorithms to estimate the first gamma interaction position from the measured light distribution. Furthermore, monolithic detectors are investigated as an alternative to pixelated detectors due to increased sensitivity, resolution and intrinsic DOI encoding. Monolithic detectors, however, are challenging because of complicated calibration setups and edge effects. In this work, we evaluate the use of neural networks to estimate the 3D first (Compton or photoelectric) interaction position. Using optical simulation data of a 50 × 50 × 16 mm3 LYSO crystal, performance is evaluated as a function of network complexity (two to five hidden layers with 64 to 1024 neurons) and amount of training data (1000−8000 training events per calibration position). We identify and address the potential pitfall of overfitting on the training grid through evaluation on intermediate positions that are not in the training set. Additionally, the performance of neural networks is directly compared with nearest neighbour positioning. Optimal performance was achieved with a network containing three hidden layers of 256 neurons trained on 1000 events/position. For more complex networks, the performance degrades at intermediate positions and overfitting starts to occur. A median 3D positioning error of 0.77 mm and a 2D FWHM of 0.46 mm is obtained. This is a 17% improvement in terms of FWHM compared to the nearest neighbour algorithm. Evaluation only on events that are not Compton scattered results in a 3D positioning error of 0.40 mm and 2D FWHM of 0.42 mm. This reveals that Compton scatter results in a considerable increase of 93% in positioning error. This study demonstrates that very good spatial resolutions can be achieved with neural networks, superior to nearest neighbour positioning. However, potential overfitting on the training grid should be carefully evaluated.