Text characteristics of English language university Web sites

Mike Thelwall

doi:10.1002/asi.20126

Abstract

AbstractThe nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic‐specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three English‐speaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Text characteristics of English language university Web sites

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science and Technology

Lead the way for us

Journal: Journal of the American Society for Information Science and Technology	Publication Date: Feb 4, 2005
Citations: 48

Similar Papers

Law & Psychiatry: Legal Concerns for Psychiatrists Who Maintain Web Sites
P R Recupero
Psychiatric Services | VOL. 57
P R RecuperoP R Recupero
01 Apr 2006
Psychiatric Services | VOL. 57

Designing Web Sites for Customer Loyalty Across Business Domains: A Multilevel Analysis
Sunil Mithas ... Claes Fornell
Journal of Management Information Systems | VOL. 23
Sunil Mithas, et. al.Sunil Mithas ... Claes Fornell
01 Dec 2006
Journal of Management Information Systems | VOL. 23

Internet marketing: web site navigational design issues
M.J Taylor ... D England
Marketing Intelligence & Planning | VOL. 24
M.J Taylor, et. al.M.J Taylor ... D England
01 Jan 2006
Marketing Intelligence & Planning | VOL. 24

Pakistani University Library Web Sites: Features, Contents, and Maintenance Issues
Muhammad Abbas Ganaee ... Muhammad Rafiq
Journal of Web Librarianship | VOL. 10
Muhammad Abbas Ganaee, et. al.Muhammad Abbas Ganaee ... Muhammad Rafiq
12 Jul 2016
Journal of Web Librarianship | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text characteristics of English language university Web sites

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science and Technology