The rapid increase of available data in different complex contexts needs automatic tasks to manage and process contents. Semantic Web technologies represent the silver bullet in the digital Internet ecosystem to allow human and machine cooperation in achieving these goals. Specific technologies as ontologies are standard conceptual representations of this view. It aims to transform data into an interoperability format providing a common vocabulary for a given domain and defining, with different levels of formality, the meaning of informative objects and their possible relationships. In this work, we focus our attention on Ontology Population in the multimedia realm. An automatic and multi-modality framework for images ontology population is proposed and implemented. It allows the enrichment of a multimedia ontology with new informative content. Our multi-modality approach combines textual and visual information through natural language processing techniques, and convolutional neural network used the features extraction task. It is based on a hierarchical methodology using images descriptors and semantic ontology levels. The results evaluation shows the effectiveness of our proposed approach.