Effect of Positive-Negative Image Ratio on the Performance of Pedestrian Detection Model

Lai Kok Yee,Wah Yen Tey,Nor Azwadi Che Sidik,Hau Sim Choo,Zun-Liang Chuan,Hooi-Siang Kang,Tan Lit Ken,Lee Kee Quen,Y. S. Gan,Yutaka Asako

doi:10.11113/mjfas.v20n2.3300

Lai Kok Yee, Wah Yen Tey + Show 8 more

Open Access

PDF Available

https://doi.org/10.11113/mjfas.v20n2.3300

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Pedestrian detection holds significant importance in computer vision, finding applications in video surveillance, human-computer interaction, and autonomous vehicles. Surprisingly, there is a scarcity of research addressing the optimal ratio of positive to negative images for training detection models. This study endeavors to fill this research gap by exploring various detection models and determining the ideal ratio. Two distinct scenarios are investigated, each characterized by an equal total image count and an equivalent number of positive images sourced from CVC-14 night/visible, night/FIR, and INRIA databases. The study leverages the Histogram of Oriented Gradient, utilizing both Support Vector Machines and Medium Neural Networks to construct the detection models. Notably, the experiments reveal that the accuracy of the models remains relatively stable, even with an increase in the ratio of negative images. However, a noteworthy inverse relationship between sensitivity and specificity emerges as the ratio escalates. The findings, guided by the Youden Index, pinpoint the optimal training ratio for pedestrian detection models, falling within the range of 1:0.5 to 1:2In the CVC-14 nighttime database, the Youden index reached 100% when the model was trained with a 1:0.5 ratio using SVM, and a total of 4500 images were employed in the training process. On the other hand, in the INRIA dataset, the Youden index exhibited its highest value at 98.50%. This occurred when both SVM and a Medium neural network were utilized to train the model with a ratio of 1:2, utilizing a total of 3000 images for the training phase. It's worth highlighting that the processing time for SVM models lags behind that of Medium Neural Networks. This disparity arises from the heightened computational complexity inherent to medium-sized neural networks, making them computationally demanding compared to SVMs. This study contributes valuable insights into the nuanced relationship between image ratios and the performance of pedestrian detection models.

Full Text