Object Detection in Complex Indoor Scenes is designed to identify and categorize objects in indoor settings, with applications in areas such as smart homes, security surveillance, and home service robots. It forms the basis for advanced visual tasks including visual question answering, video description generation, and instance segmentation. Nonetheless, the task faces substantial hurdles due to background clutter, overlapping objects, and significant size differences. To tackle these challenges, this study introduces an indoor object detection approach utilizing an enhanced DINO framework. To cater to the needs of indoor object detection, an Indoor-COCO dataset was developed from the COCO object detection dataset. The model incorporates an advanced Res2net as the backbone feature extraction network, complemented by a deformable attention mechanism to better capture detailed object features. An upgraded Bi-FPN module is employed to replace the conventional feature fusion module, and SIoU loss is utilized to expedite convergence. The experimental outcomes indicate that the refined model attains an mAP of 62.3%, marking a 5.2% improvement over the baseline model. These findings illustrate that the DINO-based indoor object detection model exhibits robust generalization abilities and practical utility for multi-scale object detection in complex environments.