Abstract

In this paper, we present a method to provide proactive assistance in text checking, based on usage relationships between words structuralized on the Web. For a given sentence, the method builds a connectionist structure of relationships between word n-grams. Such structure is then parameterized by means of an unsupervised and language agnostic optimization process. Finally, the method provides a representation of the sentence that allows emerging the least prominent usage-based relational patterns, helping to easily find badly-written and unpopular text. The study includes the problem statement and its characterization in the literature, as well as the proposed solving approach and some experimental use.

Highlights

  • In this paper, we present a method to provide proactive assistance in text checking, based on usage relationships between words structuralized on the Web

  • In order to test the effectiveness of the system, a collection of 80 sentences have been derived from the British National Corpus (BNC) [40]

  • A correct result is an atypical subsequence discovered in the sentence, whereas a correct absence of result is a good sentence where no atypical subsequence has been discovered, i.e., the lowest usage category is empty.(the lowest usage category contains the zero usage value by default, and this condition from a technical standpoint means that the category contains the zero usage value only) the terms positive and negative refer to the expectation, whereas the terms true and false refer to whether that expectation corresponds to the observation

Read more

Summary

Related Work

To the best of our knowledge, no work has been done in the field of text analysis using a connectionist model and the Web. The research field of open-world approaches to text correction is characterized by a variety of specialized NLP sub-tasks. Training process leads to scalability issues when applied to complex problems or to large training sets without guidance For this reason, web-based NLP models are typically supervised models using annotated training data, or unsupervised models which rely on external resources such as taxonomies to strengthen results. In [14], the authors present a method for correcting real-world spelling errors, i.e., words that occur when a user mistakenly types a correctly spelled word when another was intended. An unsupervised statistical method for correcting preposition errors is proposed in [19]. In [28] the authors propose a way of using web counts for some tasks of lexical disambiguation, such as part-of-speech tagging, spelling correction, and word sense disambiguation. The system is not language-specific and it can be used with other languages, by adapting the phonetic codes and transformation rules

Problem Formulation
Input Sentence and Operators
Search Engines and Hit Counts
The Connectionist Structure
The Visual Output of the Network
Overall Components of the System
The Determination of the Weights
Experimental Results
Conclusions and Future Works

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.