<p>Since the era of learning to write by human beings, mistakes made in typing words have occupied a privileged place in linguistic studies, integrating new disciplines into school curricula such as spelling and dictation. According to exhaustive studies that we have done in the field of spellchecking errors made in typing Arabic texts, very few research works that deal with typographical errors specifically caused by the insertion or missing of the blank-space in words. On the other hand, spelling correction software remains ineffective for handling this type of errors. Failure to process errors due to the insertion/missing of blankspace between and in words leads and brings us back to situations of ambiguity and incomprehension of the meaning of the typed text. To remedy this limitation of correction, we propose in this article an ad-hoc probabilistic method which is based jointly on two approaches. The first approach treats the errors due to deletion or missing of blank-space between or inside words, while the second puts emphasis in correcting space insertion errors in a word of course in addition to other kinds of elementary editing errors (addition, deletion, permutation of characters). Our new approach combines edit distance with n-gram language models to correct the errors already mentioned. Our new approach gave an accuracy rate that reaches 98,14% for missing blank-space errors (noted MBSE) and 89,5% for insertion blank-dpace errors (noted IBSE), which gives an average correction rate of around 95,26%. These results are very encouraging and show the interest and the importance of our approach.</p>
Read full abstract