Counting objects in video images has been an active area of computer vision for decades. For precise counting, it is necessary to detect objects and follow them through consecutive frames. Deep neural networks have allowed great improvements in this area. Nonetheless, this task is still a challenge for edge computing, especially when low-power edge AI devices must be used. The present work describes an application where an edge device is used to run a YOLO network and V-IOU tracker to count people and bicycles in real time. A selective frame-downsampling algorithm is used to allow a larger frame rate when necessary while optimizing memory usage and energy consumption. In the experiments, the system was able to detect and count the objects with 18 counting errors in 525 objects and a mean inference time of 112.82 ms per frame. With the selective downsampling algorithm, it was also capable of recovering and reduce memory usage while maintaining its precision.