As one of the fundamental techniques for image editing, image cropping discards irrelevant contents and remains the pleasing portions of the image to enhance the overall composition and achieve better visual/aesthetic perception. In this paper, we primarily focus on improving the efficiency of automatic image cropping, and on further exploring its potential in public datasets with high accuracy. From this perspective, we propose a deep learning based framework to learn the objects composition from photos with high aesthetic qualities, where an interested object region is detected through a convolutional neural network (CNN) based on the saliency map. The features of the detected interested objects are then fed into a regression network to obtain the final cropping result. Unlike the conventional methods that multiple candidates are proposed and evaluated iteratively, only a single interested object region is produced in our model, which is mapped to the final output directly. Thus, low computational resources are required for the proposed approach. The experimental results on the public datasets show that as a weakly supervised method, the proposed network outperforms the other weakly supervised methods on FLMS and FCD datasets and achieves comparable results to the existing methods on CUHK dataset. Furthermore, the proposed method is more efficient than these methods, where the processing speed is as fast as 20 ms per image.