Translating Braille Patterns into Arabic Text Using a Convolutional Neural Network
Considering optical Braille patterns has been investigated in several studies. There is a huge number of studies which analyzed Braille patterns in different natural languages. However, the Arabic patterns have not been examined as same as the other languages. This is due to the lack of the datasets of the Arabic patterns and the shortage of researches in this area. This study utilizes YOLOv11 model as a detection tool because of its relative effectiveness and the level of accuracy as well as the staged training approach with the AdamW optimizer and Automatic Mixed Precision. For the translation of the Arabic patterns into text, post-processing steps are performed including: detecting cells vertically clustered, horizontally sorted within a line, adaptively defined word boundaries, and corrected reading order of right-to-left. In the analysis of experiments, the best findings achieved is 0.99 of all the precision, recall, and F1 scores. Moreover, the framework-level runtime indicates that the total processing time (inference + post-processing for text extraction) ranges between 29 and 82 ms per image. The proposed framework has been examined with a primary dataset of 5924 pages of images of Braille patterns of 45 classes of Arabic letters and diacritics. The yielded results show that the proposed framework is a robust approach toward effective, scalable, responsive Arabic Braille recognition (OBR) for assistive technologies to be mobile and wearable. By building the first dedicated corpus in Arabic Braille and providing an end-to-end recognition suite, this study laid the groundwork for future research and applications in the field. This research bridges the accessibility gap, so of allow sighted individuals to access content encoded in Braille.
- Discussion
11
- 10.1016/j.plrev.2017.06.019
- Jun 21, 2017
- Physics of Life Reviews
Towards a theory of word order: Comment on “Dependency distance: a new perspective on syntactic patterns in natural language” by Haitao Liu et al.
- Research Article
18
- 10.1093/jamia/ocae065
- Mar 12, 2024
- Journal of the American Medical Informatics Association : JAMIA
Extracting PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method, PICOX, to extract overlapping PICO entities. PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then, it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using 1 of the best-performing baselines, EBM-NLP, and 3 more datasets, ie, PICO-Corpus and randomized controlled trial publications on Alzheimer's Disease (AD) or COVID-19, using entity-level precision, recall, and F1 scores. PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (P ≪.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.
- Conference Article
3
- 10.1109/icsgrc.2018.8657490
- Aug 1, 2018
Braille is a tactile that consists of dots which is used by visually impaired people in reading. Braille pattern of alphabet consists of its own pattern in which some of it does not even relate to the alphabet. Thus, it is difficult for normal people to detect and recognize the braille pattern. This paper presents a study of braille image recognition for beginners. The outcome of this study is expected to translate a braille image patterns into a readable alphabet text. A technique of Bag of Features (BOF) is proposed for the recognition of the braille image. On the other hand, the image classification is done using a Support Vector Machine (SVM) technique. Seventy-eight of braille images is tested. From the testing performed, it is found that 97.44% of correct recognition accuracy is achieved which revealed that the proposed techniques are applicable for braille image recognition.
- Research Article
31
- 10.1016/j.chiabu.2021.105059
- May 2, 2021
- Child Abuse & Neglect
Predicting youth at high risk of aging out of foster care using machine learning methods
- Discussion
2
- 10.1016/j.plrev.2017.07.001
- Jul 1, 2017
- Physics of Life Reviews
DDM at Work: Reply to comments on “Dependency distance: A new perspective on syntactic patterns in natural languages”
- Research Article
301
- 10.1016/j.plrev.2017.03.002
- Mar 27, 2017
- Physics of Life Reviews
Dependency distance: A new perspective on syntactic patterns in natural languages
- Conference Article
52
- 10.1109/cise.2009.5363900
- Dec 1, 2009
Intelligent geographical information systems (GISs) have been paid much attention in recent years, and the ultimate goal is to realize the natural language interaction between users and GISs. However, there is still a significant challenge for bridging the semantic gap between structured geospatial data in GISs and un-analytical spatial information in natural language. The representation and analysis of spatial relations has been one of generic issues in geographical information science. This paper presents a rule-based approach to spatial relation extraction in natural language text. Based on geographical named entity recognition technology and a spatial relation annotation corpus, syntactical rules of spatial relations are induced and then formalized into JAPE of the natural language processing platform GATE. Geographical named entities and spatial relations in new documents can be detected effectively in GATE. The experimental results indicate that spatial relations are usually described with several syntactical patterns in natural language, especially directional spatial relations, but topological relations are much more complicated. The fact is that rule-based extraction approaches can be implemented and integrated by means of fewer efforts than machine learning algorithms. It is known that directional spatial relations are more popularly used in natural language than topological spatial relations. Therefore, we conclude that it is practical and effective to extract spatial relations in natural language with rule-based approaches.
- Book Chapter
5
- 10.4018/978-1-878289-77-3.ch013
- Jan 1, 2001
The proposed translation of natural language (NL) patterns to object and process modeling is seen as an alternative to the symbolic notations, textual languages or classical semantic networks, the main representation tools today. Its necessity is motivated by the universality, unifying abilities, natural extensibility, logic and reusability of NL. The translation relies on a formalized, stylized and graphical representation of NL, bridging NL to an integrated view on the object and process modeling. Only the morphological and syntactic knowledge in NL is subject to translation, but the proposed solution anticipates the semantic and logical interpretation of a model. A brief presentation and exemplification of NL patterns in consideration precede the translation.
- Discussion
2
- 10.1016/j.plrev.2017.06.024
- Jun 23, 2017
- Physics of Life Reviews
Dependency distances in natural mixed languages: Comment on “Dependency distance: A new perspective on syntactic patterns in natural languages” by Haitao Liu et al.
- Research Article
8
- 10.1007/s00432-023-05467-7
- Nov 10, 2023
- Journal of Cancer Research and Clinical Oncology
PurposeUltrasound imaging is the preferred method for the early diagnosis of endometrial diseases because of its non-invasive nature, low cost, and real-time imaging features. However, the accurate evaluation of ultrasound images relies heavily on the experience of radiologist. Therefore, a stable and objective computer-aided diagnostic model is crucial to assist radiologists in diagnosing endometrial lesions.MethodsTransvaginal ultrasound images were collected from multiple hospitals in Quzhou city, Zhejiang province. The dataset comprised 1875 images from 734 patients, including cases of endometrial polyps, hyperplasia, and cancer. Here, we proposed a based self-supervised endometrial disease classification model (BSEM) that learns a joint unified task (raw and self-supervised tasks) and applies self-distillation techniques and ensemble strategies to aid doctors in diagnosing endometrial diseases.ResultsThe performance of BSEM was evaluated using fivefold cross-validation. The experimental results indicated that the BSEM model achieved satisfactory performance across indicators, with scores of 75.1%, 87.3%, 76.5%, 73.4%, and 74.1% for accuracy, area under the curve, precision, recall, and F1 score, respectively. Furthermore, compared to the baseline models ResNet, DenseNet, VGGNet, ConvNeXt, VIT, and CMT, the BSEM model enhanced accuracy, area under the curve, precision, recall, and F1 score in 3.3–7.9%, 3.2–7.3%, 3.9–8.5%, 3.1–8.5%, and 3.3–9.0%, respectively.ConclusionThe BSEM model is an auxiliary diagnostic tool for the early detection of endometrial diseases revealed by ultrasound and helps radiologists to be accurate and efficient while screening for precancerous endometrial lesions.
- Research Article
94
- 10.1016/j.urology.2023.05.040
- Jul 4, 2023
- Urology
Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer?
- Research Article
9
- 10.1016/j.nlp.2024.100090
- Jul 20, 2024
- Natural Language Processing Journal
Natural Language Processing (NLP) systems enable machines to understand, interpret, and generate human-like language, bridging the gap between human communication and computer understanding. Natural Language Interface to Databases (NLIDB) and Natural Language Interface to Visualization (NLIV) systems are designed to enable non-technical users to retrieve and visualize data through natural language queries. However, these systems often face challenges in handling complex correlation and analytical questions, limiting their effectiveness for comprehensive data analysis. Additionally, current Business Intelligence (BI) tools also struggle with understanding the context and semantics of complex questions, further hindering their usability for strategic decision-making. Also, when building these models for generating the queries from natural language, the system handles only the semantic parsing issues as each column header is being changed manually to their normal names by all existing models which is time-consuming, tedious, and subjective.Recent studies reflect the need for attention to context, semantics, and especially ambiguities in dealing with natural language questions. To address this problem, the proposed architecture focuses on understanding the context, correlation-based semantic analysis, and removal of ambiguities using a novel approach. An Enhanced Longest Common Subsequence (ELCS) is suggested where existing LCS is modified with a memorization component for mapping the natural language question tokens with ambiguous table column headers. This can speed up the overall process as human intervention is not required to manually change the column headers. The same is evidenced by carrying out thorough experimentation and comparative study in terms of precision, recall, and F1 score. By synthesizing the latest advancements and addressing challenges, this paper has proved how NLP can significantly enhance the accuracy and efficiency of information retrieval and visualization, broadening the inclusivity and usability of NLIDB, NLIV, and BI systems.
- Research Article
5
- 10.1093/jamiaopen/ooaf092
- Jul 3, 2025
- JAMIA Open
ObjectivesLarge language models (LLMs) have demonstrated high levels of performance in clinical information extraction compared to rule-based systems and traditional machine-learning approaches, offering scalability, contextualization, and easier deployment. However, most studies rely on proprietary models with privacy concerns and high costs, limiting accessibility. We aim to evaluate 14 publicly available open-source LLMs for extracting clinically relevant findings from free-text echocardiogram reports and examine the feasibility of their implementation in information extraction workflows.Materials and MethodsWe used 14 open-source LLM models to extract clinically relevant entities from echocardiogram reports (n = 507). Each report was manually annotated by 2 independent health-care professionals and adjudicated by a third. Lexical variance and length of each echocardiogram report were collected. Precision, recall, and F1 scores were calculated for the 9 extracted entities via multiclass classification.ResultsIn aggregate, Gemma2:9b-instruct had the highest precision, recall, and F1 scores at 0.973 (0.962-0.983), 0.959 (0.947-0.973), and 0.965 (0.951-0.975), respectively. In comparison, Phi3:3.8b-mini-instruct had the lowest precision score at 0.831 (0.804-0.856), while Gemma:7b-instruct had the lowest recall and F1 scores at 0.382 (0.356-0.408) and 0.392 (0.356-0.428), respectively.Discussion and ConclusionUsing LLMs for entity extraction for echocardiogram reports has the potential to support both clinical research and health-care delivery. Our work demonstrates the feasibility of using open-source models for more efficient computation and extraction.
- Research Article
11
- 10.1142/s0219519422400322
- Sep 26, 2022
- Journal of Mechanics in Medicine and Biology
In this study, we attempted to confirm whether the InceptionV3 model, which shows excellent performance in lung disease classification using chest X-ray images, is suitable for cardiac disease classification. In addition, we proposed a method for improving classification accuracy by improving the structure of the existing InceptionV3 model. The deep learning model used in this study was a modified version of the fully-connected hierarchical structure of InceptionV3. The proposed InceptionV3 model structure was constructed to differentiate between a normal heart and hypertrophic heart. The data used for model training were trained after data augmentation on 1026 chest X-ray images of patients diagnosed with normal heart and cardiac hypertrophy at Kyungpook National University Hospital. The experiment showed a learning classification accuracy of 99.57% and loss of 1.42% for the original InceptionV3 model. The accuracy and loss of the modified InceptionV3 model were 99.81% and 0.92%, respectively. Its classification performance was evaluated based on precision, recall, and F1 score. For a normal heart, precision, recall and F1 score were 78%, 100% and 88%, respectively. For cardiomegaly, classification accuracy, recall and F1 score were 78%, 100% and 88%, respectively. Conversely, the modified model showed 100% precision, 92% recall and 96% F1 score. For cardiomegaly, classification accuracy, recall rate and F1 score were 95, 100 and 97%, respectively. In conclusion, better classification can be achieved if the chest X-ray images for a normal heart and cardiomegaly are classified using the proposed model. Hence, the reliability of the classification performance gradually increases.
- Research Article
72
- 10.1007/s00138-020-01094-1
- Jul 16, 2020
- Machine Vision and Applications
In contemporary times, machine learning is being used in almost every field due to its better performance. Here, we consider different machine learning methods such as logistic regression, random forest, support vector classifier (SVC), AdaBoost classifier, bagging classifier, voting classifier, and Xception model to classify the breast cancer tumor and evaluate their performances. We used a standard dataset, i.e., breast Histopathology images, that has more than two lakhs color patches, each patch of size $$50\times 50$$ scanned at the resolution of 40 $$\times $$ . We use 60% of the above-mentioned dataset for training, 20% for validation, and 20% testing to all above-mentioned classifiers. The logistic regression classifier provides the scores of each precision, recall, and F1 measure as 0.72. The random forest method provides the score of each precision, recall, and F1 score as 0.80. The bagging and voting classifiers both have the values of each precision, recalls, and F1 scores as 0.81. In this case, both SVC and AdaBoost classifiers have the score of each precision, recall, and F1 score as 0.82, whereas in the case of the deep learning method, Xception model is used to have the score of each precision, recall, and F1 measure as 0.90 in the same condition. Thus, the Xception method performs the best among all mentioned methods in terms of each of the performance measures, i.e., precision, recall, and F1 score for the classification of breast cancer tumors. Thus, the importance of this research work is that we can classify tumors more accurately in less time. It may increase awareness of people toward breast cancer and decrease fears of tumors.