One schema to rule them all: How Schema.org models the world of search

Andrew Iliadis,Sezgi Başak Kavakli,Wesley Stevens,Amelia Acker

doi:10.1002/asi.24744

Abstract

AbstractSeveral industry‐specific metadata initiatives have historically facilitated structured data modeling for the web in domains such as commerce, publishing, social media, and so forth. The metadata vocabularies produced by these initiatives allow developers to “wrap” information on the web to provide machine‐readable signals for search engines, advertisers, and user‐facing content on apps and websites, thus assisting with surfacing facts about people, places, and products. A universal iteration of such a project called Schema.org started in 2011, resulting from a partnership between Google, Microsoft, Yahoo, and Yandex to collaborate on a single structured data model across domains. Yet, few studies have explored the metadata vocabulary terms in this significant web resource. What terms are included, upon what subject domains do they focus, and how does Schema.org represent knowledge in its conceptual model? This article presents findings from our extraction and analysis of the documented release history and complete hierarchy on Schema.org's developer pages. We provide a semantic network visualization of Schema.org, including an analysis of its modularity and domains, and discuss its global significance concerning fact‐checking and COVID‐19. We end by theorizing Schema.org as a gatekeeper of data on the web that authors vocabulary that everyday web users encounter in their searches.

Full Text