ELM-based hierarchical learning framework for multilayer perceptron in which it has self-taught feature extraction followed by supervised feature classification and they are bridged by random initialized hidden weights is built in a multilayer manner. H-ELM training is divided into two separate phases: 1) unsupervised hierarchical feature representation and 2) supervised feature classification. A new H-ELM-based auto-encoder is developed to extract multilayer sparse features of the input data. The original ELM-based regression is performed for final decision making. H-ELM-based feature extraction and detection algorithms are developed for practical computer vision applications, such as object detection, recognition, and tracking. Since the supervised training is implemented by the original ELM, the unsupervised building blocks of the H-ELM architecture. The hidden layers of the framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. The advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed. To overcome this, in the proposed system, these issues are overcome by implementing with image and text using Back-Propagation–Extreme Learning Machine algorithm (BP-ELM) and then classified both text and image by using Support Vector Machine algorithm. Hence the proposed system detects the image and text in the form accuracy and efficiency of an image.