During the crisis, people post a large number of informative and non-informative tweets on Twitter. Informative tweets provide helpful information such as affected individuals, infrastructure damage, availability and resource requirements, etc. In contrast, non-informative tweets do not provide helpful information related to either humanitarian organizations or victims. Identifying informative tweets is a challenging task during the disaster. People often post images along with text on Twitter during the disaster. In addition to tweet text features, image features are also crucial for identifying informative tweets. However, existing methods use only text features but do not use image features to identify crisis-related tweets during the disaster. This paper proposes a novel approach by considering the image features along with the text features. It includes a text-based classification model, an image-based classification model and a late fusion. The text-based classification model uses the Convolutional Neural Network (CNN) and the Artificial Neural Network (ANN). CNN is used to extract text features from a tweet and the ANN is used to classify tweets based on extracted text features of CNN. The image-based classification model uses the fine-tuned VGG-16 architecture to extract the image features from the image and classify the image in a tweet. The output of the text-based classification model and the image-based classification model are combined using the late fusion technique to predict the tweet label. Extensive experiments are carried out on Twitter datasets of various crises, such as the Mexico earthquake, California Wildfires, etc., to demonstrate the effectiveness of the proposed method. The proposed method outperforms the state-of-the-art methods on various parameters to identify informative tweets during the disaster.