Objective Pedestrian accidents contribute significantly to the high number of annual traffic casualties. It is therefore crucial for pedestrians to use safety measures, such as a crosswalk, and to activate pedestrian signals. However, people often fail to activate the signal or are unable to do so – those who are visually impaired or have occupied hands may be unable to activate the system. Failure to activate the signal can result in an accident. This paper proposes an improvement to crosswalk safety by designing a system that can detect pedestrians and trigger the pedestrian signal automatically when necessary. Methods In this study, a dataset of images was collected in order to train a Convolutional Neural Network (CNN) to distinguish between pedestrians (including bicycle riders) when crossing the street. The resulting system can capture and evaluate images in real-time, and the result can be used to automatically activate a system such as a pedestrian signal. A threshold system is also implemented that triggers the crosswalk only when the positive predictions pass the threshold. This system was tested by deploying it at three real-world environments and comparing the results with a recorded video of the camera’s view. Results The CNN prediction model is able to correctly predict pedestrian and cyclist intentions with an average accuracy of 84.96% and had an absence trigger rate of 0.037%. The prediction accuracy varies based on the location and whether a cyclist or pedestrian is in front of the camera. Pedestrians crossing the street were correctly predicted more accurately than cyclists crossing the street by up to 11.61%, while passing (i.e., non-crossing) cyclists were correctly ignored more than passing pedestrians, by up to 18.75%. Conclusion Based on the testing of the system in real-world environments, the authors conclude that it is feasible as a back-up system that can complement existing pedestrian signal buttons, and thereby improve the overall safety of crossing the street. Further improvements to the accuracy can be achieved with a more comprehensive dataset for a specific location where the system is deployed. Implementing different computer vision techniques optimized for tracking objects should also increase the accuracy.