Video sensor networks play a vital role in unattended wide area surveillance. Most of the computer vision research in this area deals with networks of stationary electro-optical sensors. But recently there has been increased interest in networks using sensors on mobile platforms such as mobile robots, all-terrain vehicles, and unmanned aerial vehicles (UAVs). Here, we present novel computer-vision techniques for automatic object detection and tracking in mobile sensor networks. We use multiple commercial off-the-shelf (COTS) sensors that enable monitoring over large areas. The effectiveness of the object detection and tracking framework is demonstrated using aerial videos from multiple mobile sensors aboard UAVs (a sample UAV is shown in Figure 1). The goal of an effective video surveillance system is to detect objects in an area of interest and to find their correspondence across many frames. There are several issues inherent to the problem, such as rapidly changing lighting conditions (e.g. due to cloud cover), shadow, occlusion, and entry/exit of objects. The small size of objects taken through aerial imagery is another issue, making it difficult to detect and track through varying appearance and frequent occlusions. To ease this problem, we use object appearance, shape, and motion models to detect and track objects from a single UAV. Once the objects are tracked successfully, we apply geometric similarity between object trajectories across UAVs to obtain consistent labeling (global object correspondence). Conventional tracking methods that exploit appearance or position similarity are not suitable because of significant camera motion and object positional variation. So, we have developed the COCOA system, which uses three modules to achieve object detection and tracking in a single Figure 1. Shown is anUAV used in our experiments.