Words Algorithm Collection - finding closely related open access books using text mining techniques

Ronald Snijder

doi:10.53377/lq.10938

Abstract

Open access platforms and retail websites are both trying to present the most relevant offerings to their patrons. Retail websites deploy recommender systems that collect data about their customers. These systems are successful but intrude on privacy. As an alternative, this paper presents an algorithm that uses text mining techniques to find the most important themes of an open access book or chapter. By locating other publications that share one or more of these themes, it is possible to recommend closely related books or chapters. The algorithm splits the full text in trigrams. It removes all trigrams containing words that are commonly used in everyday language and in (open access) book publishing. The most occurring remaining trigrams are distinctive to the publication and indicate the themes of the book. The next step is finding publications that share one or more of the trigrams. The strength of the connection can be measured by counting – and ranking – the number of shared trigrams. The algorithm was used to find connections between 10,997 titles: 67% in English, 29% in German and 6% in Dutch or a combination of languages. The algorithm is able to find connected books across languages. It is possible use the algorithm for several use cases, not just recommender systems. Creating benchmarks for publishers or creating a collection of connected titles for libraries are other possibilities. Apart from the OAPEN Library, the algorithm can be applied to other collections of open access books or even open access journal articles. Combining the results across multiple collections will enhance its effectiveness.

Highlights

Open access platforms and retail websites have one thing in common: they are trying to present the most relevant offerings possible to their patrons
Recommender systems based on personal data are successful but are not a viable option for those who want to protect the privacy of their users
Deploying a ngrams based algorithm is a good alternative for open access books, as it uses the contents of the publications

Summary

Introduction

Open access platforms and retail websites have one thing in common: they are trying to present the most relevant offerings possible to their patrons. Removing all trigrams that contain commonly used words brings the remaining number back to two Deploying this procedure to the complete text of a book still creates a large set of trigrams, the need for additional filtering using terms that are common for open access academic books. A text mining algorithm written in the R programming language uses the full text of the publications, filters out the trigrams and creates an overview of closely related books and chapters. Different users may have different needs: a reader might be interested in finding a few select titles, while a library might want to download a larger collection of books around a certain topic

Background

Libraries and Privacy

Recommender Systems

Ngrams

Other Experiments

Finding Related Titles by Algorithm

The Algorithm

The Data Set

Finding Connected Titles

Single Book

Groups

Finding Translations

Use Cases

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Words Algorithm Collection - finding closely related open access books using text mining techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: LIBER Quarterly: The Journal of the Association of European Research Libraries

Lead the way for us

Journal: LIBER Quarterly: The Journal of the Association of European Research Libraries	Publication Date: Aug 24, 2021
License type: CC BY 4.0

Similar Papers

Evaluating the relationship between the academic and social impact of open access books based on citation behaviors and social media attention
Mingkun Wei ... Abdolreza Noroozi Chakoli
Scientometrics | VOL. 125
Mingkun Wei, et. al.Mingkun Wei ... Abdolreza Noroozi Chakoli
28 Aug 2020
Scientometrics | VOL. 125

Open Access to Books – the Perspective of a Non-profit Infrastructure Provider
Eelco Ferwerda ... Niels Stern
The Journal of Electronic Publishing | VOL. 26
Eelco Ferwerda, et. al.Eelco Ferwerda ... Niels Stern
09 May 2023
The Journal of Electronic Publishing | VOL. 26

Open access books through open data sources: assessing prevalence, providers, and preservation
Mikael Laakso
Journal of Documentation | VOL. 79
Mikael LaaksoMikael Laakso
13 Jun 2023
Journal of Documentation | VOL. 79

Statistics on Open Access Books Available through the Directory of Open Access Books

-

01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Words Algorithm Collection - finding closely related open access books using text mining techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: LIBER Quarterly: The Journal of the Association of European Research Libraries