Imaging-based methods of food portion size estimation (FPSE) promise higher accuracies compared to traditional methods. Many FPSE methods require dimensional cues (fiducial markers, finger-references, object-references) in the scene of interest and/or manual human input (wireframes, virtual models). This paper proposes a novel passive, standalone, multispectral, motion-activated, structured light-supplemented, stereo camera for food intake monitoring (FOODCAM) and an associated methodology for FPSE that does not need a dimensional reference given a fixed setup. The proposed device integrated a switchable band (visible/infrared) stereo camera with a structured light emitter. The volume estimation methodology focused on the 3-D reconstruction of food items based on the stereo image pairs captured by the device. The FOODCAM device and the methodology were validated using five food models with complex shapes (banana, brownie, chickpeas, French fries, and popcorn). Results showed that the FOODCAM was able to estimate food portion sizes with an average accuracy of 94.4%, which suggests that the FOODCAM can potentially be used as an instrument in diet and eating behavior studies.