Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • Open Access Icon
  • Research Article
  • 10.5281/zenodo.4065271
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
  • Jan 11, 2021
  • Journal of Data Mining and Digital Humanities
  • Sarah Barman + 4 more

Many libraries offer free access to digitised historical newspapers via user interfaces. After an initial period of search and filter options as the only features, the availability of more advanced tools and the desire for more options among users has ushered in a period of interface development. However, this raises a number of open questions and challenges. For example, how can we provide interfaces for different user groups? What tools should be available on interfaces and how can we avoid too much complexity? What tools are helpful and how can we improve usability? This paper will not provide definite answers to these questions, but it gives an insight into the difficulties, challenges and risks of using interfaces to investigate historical newspapers. More importantly, it provides ideas and recommendations for the improvement of user interfaces and digital tools.

  • Research Article
  • 10.17605/osf.io/ht8se
(an:a)-lyzer: An interactive visualization of Google Books Ngrams with R and Shiny: Exploring a(n) historical increase in onset strength in a(n) huge database
  • Jan 31, 2019
  • Journal of Data Mining and Digital Humanities
  • Julia Schlüter + 1 more

Using the re-emergence of the /h/ onset from Early Modern to Present-Day English as a case study, we illustrate the making and the functions of a purpose-built web application named (an:a) lyzer for the interactive visualization of the raw n-gram data provided by Google Books Ngrams (GBN). The database has been compiled from the full text of over 4.5 million books in English, totalling over 468 billion words and covering roughly five centuries. We focus on bigrams consisting of words beginning with graphic preceded by the indefinite article allomorphs a and an, which serve as a diagnostic of the consonantal strength of the initial /h/. The sheer size of this database affords us the possibility to attain a maximal diachronic resolution, to distinguish highly specific groups of -initial lexical items, and even to trace the diffusion of the observed changes across individual lexical units. The functions programmed into the app enable us to explore the data interactively by filtering, selecting and viewing them according to various parameters that were manually annotated into the data frame. We also discuss limitations of the database, of the app and of the explorative data analysis. The app is publicly accessible online at https://osf.io/ht8se/.

  • Research Article
  • 10.5282/ubm/epub.71919
Recurrent Pattern Modelling in a Corpus of Armenian Manuscript Colophons
  • Dec 22, 2017
  • Journal of Data Mining and Digital Humanities
  • Emmanuel Van Elverdinghe

Colophons of Armenian manuscripts are replete with yet untapped riches. Formulae are not the least among them: these recurrent stereotypical patterns conceal many clues as to the schools and networks of production and diffusion of books in Armenian communities. This paper proposes a methodology for exploiting these sources, as elaborated in the framework of a PhD research project about Armenian colophon formulae. Firstly, the reader is briefly introduced to the corpus of Armenian colophons and then, to the purposes of our project. In the third place, we describe our methodology, relying on lemmatization and modelling of patterns into automata. Fourthly and finally, the whole process is illustrated by a basic case study, the occasion of which is taken to outline the kind of results that can be achieved by combining this methodology with a philologico-historical approach to colophons.