The success of artificial intelligence (AI) applications is heavily dependent on the quality of data they rely on. Thus, data curation, dealing with cleaning, organising, and managing data, has become a significant research area to be addressed. Increasingly, semantic data structures such as ontologies and knowledge graphs empower the new generation of AI systems. In this article, we focus on ontologies as a special type of data. Ontologies are conceptual data structures representing a domain of interest and are often used as a backbone to knowledge-based intelligent systems or as an additional input for machine learning algorithms. Low-quality ontologies, containing incorrectly represented information or controversial concepts modelled from a single viewpoint, can lead to invalid application outputs and biased systems. Thus, we focus on the curation of ontologies as a crucial factor for ensuring trust in the enabled AI systems. While some ontology quality aspects can be automatically evaluated, others require a human-in-the-loop evaluation. Yet, despite the importance of the field, several ontology quality aspects have not yet been addressed and there is a lack of guidelines for optimal design of human computation tasks to perform such evaluations. In this article, we advance the state-of-the-art by making two novel contributions. First, we propose a human computation (HC)–based approach for the verification of ontology restrictions —an ontology evaluation aspect that has not yet been addressed with HC techniques. Second, by performing two controlled experiments with a junior expert crowd, we empirically derive task design guidelines for achieving high-quality evaluation results related to (i) the formalism for representing ontology axioms and (ii) crowd qualification testing . We find that the representation format of the ontology does not significantly influence the campaign results. Nevertheless, contributors expressed a preference in working with a graphical ontology representation. Additionally, we show that an objective qualification test is better fitted at assessing contributors’ prior knowledge rather than a subjective self-assessment and that prior modelling knowledge of the contributors had a positive effect on their judgements. We make all artefacts designed and used in the experimental campaign publicly available.