Abstract

We read with great interest the Perspective “Creating a science of the Web” by T. Berners-Lee et al. (11 Aug, p. 769). We agree that evolving Web technologies enable the creation of novel structures of information, whose properties and dynamics can be fruitfully studied. More generally, we would like to point out that the Web is a specific phenomenon associated with the increasing prevalence of information being digitized and linked together into complicated structures. The complexity of these structures underscores the need for systematic, large-scale data mining both to uncover new patterns in social interactions and to make discoveries in science through connecting disparate findings. For this vision to be realized, we have to develop a new science of practical data mining focusing on questions answerable with the existing digital libraries of information. In particular, today, free-text search (as embodied by Google) is the primary means of mining the Web, but there are many kinds of information requests it cannot handle. Queries combining general, standardized annotation about pages (such as from the semantic Web) with free-text search within them are often not supported—e.g., doing a full-text search of all biophysics blogs emanating just from governmental institutions within 100 miles of Chicago. Furthermore, it would be useful to develop ways of leveraging the small amounts of highly structured information in the semantic Web as “gold-standard training sets” to help bootstrap the querying and clustering of the large bodies of unstructured information on the Web as a whole. Thus, the science of the Web should enumerate the range of information requests that can be fruitfully made and the kinds of information infrastructure and data-mining techniques needed to fulfill them. # Response {#article-title-2} We agree with Smith and Gerstein's view that data mining is among the many important areas of research that are considering the Web as an object of scientific inquiry. They are correct in pointing out the importance of “text mining,” the basis of current Web search, for providing new Web capabilities. However, with the increasing amount of directly machine-readable data that are available on the Web (coming from, for example, database-producing equipment such as modern scientific devices and data-oriented applications), it is also clear that text mining needs to be augmented with new data technologies that work more directly with data and meta-data. Data mining is also an excellent case in point for the main focus of our Perspective in relation to the interdisciplinary nature of the emerging science of the Web. Analytic modeling techniques will be needed to understand where Web data reside and how they can best be accessed and integrated. Engineering and language development are needed if we are to be able to perform data mining without having to pull all the information into centralized data servers of a scale that only the few largest search companies can currently afford. In addition, data mining provides not just opportunities for better search, but also real policy issues with respect to information access and user privacy, especially where multiple data sources are aggregated into searchable forms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.