Japanese Web Research Articles

PurposeThe purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic acquisition of linguistic knowledge.Design/methodology/approachThe author has developed a new machine translation methodology that only requires a bilingual lexicon and a parallel corpus of surface sentences aligned at the sentence level to learn new transfer rules.FindingsA first prototype of a web‐based Japanese‐English translation system called Japanese‐English translation using corpus‐based acquisition of transfer (JETCAT) has been implemented in SWI‐Prolog, and a Greasemonkey user script to analyze Japanese web pages and translate sentences via Ajax. In addition, linguistic information is displayed at the character, word, and sentence level to provide a useful tool for web‐based language learning. An important feature is customization; the user can simply correct translation results leading to an incremental update of the knowledge base.Research limitations/implicationsThis paper focuses on the technical aspects and user interface issues of JETCAT. The author is planning to use JETCAT in a classroom setting to gather first experiences and will then evaluate a real‐world deployment; also work has started on extending JETCAT to include collaborative features.Practical implicationsThe research has a high practical impact on academic language education. It also could have implications for the translation industry by superseding certain translation tasks and, on the other hand, adding value and quality to others.Originality/valueThe paper presents an extended version of the paper receiving the Emerald Web Information Systems Best Paper Award at iiWAS2010.

Web spamming has emerged to deceive search engines and obtain a higher ranking in search result lists which brings more traffic and profits to web sites. Link farm is one of the major spamming techniques, which creates a large set of densely inter-linked spam pages to deceive link-based ranking algorithms that regard incoming links to a page as endorsements to it. Those link farms need to be eliminated when we are searching, analyzing and mining the Web, but they are also interesting social activities in the cyberspace. Our purpose is to understand dynamics of link farms, such as, how much they are growing or shrinking, and how their topics change over time. Such information is helpful in developing new spam detection techniques and tracking spam sites for observing their topics. Especially, we are interested in where we can find emerging spam sites that is useful for updating spam classifiers. In this paper, we study overall size/topic distribution and evolution of link farms in large-scale Japanese web archives for three years containing four million hosts and 83 million links. As far as we know, the overall characteristics of link farms in a time-series of web snapshots of this scale have never been explored. We propose a method for extracting link farms and investigate their size distribution and topics. We observe the evolution of link farms from the perspective of size growth and change in topic distribution. We recursively decomposed host graphs into link farms and found that from 4% to 7% of hosts were members of link farms. This implies we can remove quite a number of spam hosts without contents analysis. We also found the two dominant topics, “Adult” and “Travel”, accounted for over 60% of spam hosts in link farms. The size evolution of link farms showed that many link farms maintained for years, but most of them did not grow. The distribution of topics in link farms was not significantly changed, but hosts and keywords related to each topic dynamically changed. These results suggest that we can observe topic changes in each link farm, but we cannot efficiently find emerging spam sites by monitoring link farms. This implies that to detect newly created spam sites, monitoring current link farm is not enough. Detecting sites that generate links to spam sites would be an effective approach.

Japanese Web Research Articles

Articles published on Japanese Web

Japanese WEB Business Startups

Automatic linguistic knowledge acquisition for the web

A Study of Link Farm Evolution Using a Time-series of Web Snapshots

Automatic Extraction and Evaluation of Human Activity Using Conditional Random Fields and Self-Supervised Learning

Detecting Hijacked Sites by Web Spammer Using Link-Based Algorithms

Finding Related Search Engine Queries by Web Community Based Query Enrichment

Stable isotope-guided analysis of congener-specific PCB concentrations in a Japanese coastal food web

Query structuring and expansion with two-stage term dependence for Japanese web retrieval

A Web Corpus and Word Sketches for Japanese

日本語ウェブページに出現するムードの収集，および拡充したムード体系の提案

Automatically generating related queries in Japanese

ゲノム解析ツールリンク集

Localization of Web design: An empirical comparison of German, Japanese, and United States Web site characteristics

Beyond the Net

Language Policy in South Korea and the Special Case of Japanese

Review of Japanese Web sites for Chinese history

Financial reporting on the Internet by leading Japanese companies

Measuring cultural adaptation on the Web: a content analytic study of U.S. and Japanese Web sites

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Japanese Web Research Articles

Articles published on Japanese Web

Japanese WEB Business Startups

Automatic linguistic knowledge acquisition for the web

A Study of Link Farm Evolution Using a Time-series of Web Snapshots

Automatic Extraction and Evaluation of Human Activity Using Conditional Random Fields and Self-Supervised Learning

Detecting Hijacked Sites by Web Spammer Using Link-Based Algorithms

Finding Related Search Engine Queries by Web Community Based Query Enrichment

Stable isotope-guided analysis of congener-specific PCB concentrations in a Japanese coastal food web

Query structuring and expansion with two-stage term dependence for Japanese web retrieval

A Web Corpus and Word Sketches for Japanese

日本語ウェブページに出現するムードの収集，および拡充したムード体系の提案

Automatically generating related queries in Japanese

ゲノム解析ツールリンク集

Localization of Web design: An empirical comparison of German, Japanese, and United States Web site characteristics

Beyond the Net

Language Policy in South Korea and the Special Case of Japanese

Review of Japanese Web sites for Chinese history

Financial reporting on the Internet by leading Japanese companies

Measuring cultural adaptation on the Web: a content analytic study of U.S. and Japanese Web sites