Abstract

The world-wide-web has become the favorite destination of information seekers across the globe. With its massive amount of information that includes billions of web pages, information for just about any topic is a click-of-finger away. Analyzing the massive content of the web has many important aspects such as information discovering, efficient search engines and social and political patterns. Web mining techniques such as text classification and categorization are being used to provide an “under-the-microscope” picture of the web. The Arabic web represents an important portion of the web. With Arabic as the 5th most spoken language in the world and with the increasing number of Arabic Internet users at exponential rates, it is becoming important to analyze the Arabic web content and study its trends. This paper presents a close look at the content of the Arabic web. It presents the percentiles of the contents of the web in five categories, namely, politics, culture, sports, economics and religion. We used two different text classification algorithms and compared their results. We have also compared between the two text classification techniques in terms of precision and recall. The classifiers shown that the economics and politics are the highest percentiles (65% combined) while the culture and religion categories scored the lowest percentiles (about 10% combined)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call