Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • N-gram Language Models
  • N-gram Language Models
  • Language Model
  • Language Model
  • N-gram Model
  • N-gram Model

Articles published on Character-level Language Model

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
18 Search results
Sort by
Recency
  • Research Article
  • 10.3389/fcomp.2025.1626899
Exploring the impact of fixed theta values in RoPE on character-level language model performance and efficiency
  • Aug 22, 2025
  • Frontiers in Computer Science
  • Zhigao Huang + 2 more

Rotary Positional Embedding (RoPE) is a widely used technique in Transformers, influenced by the hyperparameter theta (θ). However, the impact of varying *fixed* theta values, especially the trade-off between performance and efficiency on tasks like character-level modeling, remains under-explored. This paper presents a systematic evaluation of RoPE with fixed theta values (ranging from 500 to 50,000) on a character-level GPT model across three datasets: Tiny Shakespeare, Enwik8, and Text8, compared against the standard θ = 10, 000 baseline. However, all non-default theta configurations incur significant computational overhead: inference speed is approximately halved across all datasets, suggesting implementation—specific bottlenecks rather than theta—dependent costs. This study quantifies a critical performance—efficiency trade-off when tuning fixed RoPE theta. Our findings emphasize the practical need to balance generalization gains with computational budgets during model development and deployment, contributing empirical insights into RoPE hyperparameter sensitivity and demonstrating that optimal theta selection is highly dataset-dependent. These insights suggest that future positional encoding designs could benefit from adaptive θ scheduling or dataset-specific θ optimization strategies to maximize both performance and computational efficiency.

  • Research Article
  • 10.3389/frai.2025.1628943
Spectral momentum integration: hybrid optimization of frequency and time domain gradients
  • Aug 7, 2025
  • Frontiers in Artificial Intelligence
  • Zhigao Huang + 2 more

We propose Spectral Momentum Integration (SMI), an optimization enhancement that processes gradients in both frequency and time domains. SMI applies the Fast Fourier Transform to selectively filter gradient frequency components before blending them with original gradients using an adaptive scheduling mechanism. Experiments on a character-level language model demonstrate that SMI can achieve inference acceleration while maintaining model performance. Our approach integrates with existing optimizers without modifying model architecture, though it introduces computational overhead and hyperparameter complexity. While our current validation is limited to small-scale experiments, SMI provides a proof-of-concept for incorporating frequency-domain processing into neural network optimization, suggesting potential for broader applications pending large-scale validation.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/info16060475
Spectral Adaptive Dropout: Frequency-Based Regularization for Improved Generalization
  • Jun 6, 2025
  • Information
  • Zhigao Huang + 2 more

Deep neural networks are often susceptible to overfitting, necessitating effective regularization techniques. This paper introduces Spectral Adaptive Dropout, a novel frequency-based regularization technique that dynamically adjusts dropout rates based on the spectral characteristics of network gradients. The proposed approach addresses the limitations of traditional dropout methods by adaptively targeting high-frequency components that typically contribute to overfitting while preserving essential low-frequency information. Through extensive experimentation on character-level language modeling tasks, the study demonstrates that the method achieves a 1.10% improvement in validation loss while maintaining competitive inference speeds. Thise research explores several implementations including FFT-based analysis, wavelet decomposition, and per-attention-head adaptation, culminating in an optimized approach that balances computational efficiency with regularization effectiveness. Our results highlight the significant potential of incorporating frequency-domain information into regularization strategies for deep neural networks.

  • Research Article
  • Cite Count Icon 2
  • 10.70179/grdjev09i120213
Evolving Neural Network Designs with Genetic Algorithms: Applications in Image Classification, NLP, and Reinforcement Learning
  • Dec 1, 2024
  • Global Research and Development Journals
  • Rajesh Kumar Malviya + 3 more

A method of evolving deep learning architectures using genetic algorithms is presented. The method is a first step towards a low-cost evolutionary search for task-specific neural networks. We evolve task-specific model architectures optimized for fast execution and low error on several standard machine learning tasks: image classification, character-level language modeling, and solving the cart pole problem. We also introduce a simple variation of the method that is capable of evolving neural networks with recurrent connections of varying depth and length and show performance on a word-level language modeling task. The method is implemented in an open-source library. We hope that the ability to run an evolutionary search at this scale will make it possible for a wide audience to develop deep learning architectures that are specialized for a variety of tasks and to develop many interesting novel architectural features. A new method that uses evolutionary search to directly modify existing neural network architectures to perform a specific task is presented. We demonstrate that task-specific specialization of deep learning models can be useful in practice. We modify convolutional neural networks, residual networks, and an LSTM variant to perform various tasks, and show that specialized networks often perform better than models trained from scratch that have many more parameters and much larger training time. For example, on the object recognition task, a specialized model is built by training a base network to predict object position and then applying a series of genetic search operations to squeeze the network and fit new final layer weights to the output. The specialized model is 8 times faster and has 13% lower error, despite being 17 times smaller than a fully trained larger and slower network.

  • Research Article
  • Cite Count Icon 7
  • 10.1162/tacl_a_00651
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation
  • Apr 16, 2024
  • Transactions of the Association for Computational Linguistics
  • Lukas Edman + 4 more

Abstract Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing tasks. However, there has been little research on their effectiveness for neural machine translation (NMT), particularly within the popular pretrain-then-finetune paradigm. This work performs an extensive comparison across multiple languages and experimental conditions of character- and subword-level pretrained models (ByT5 and mT5, respectively) on NMT. We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited. In our analysis, we show how character models’ gains in translation quality are reflected in better translations of orthographically similar words and rare words. While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5, suggesting an ability to modulate word-level and character-level information during generation. We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1145/3483446
Improving Deep Learning based Automatic Speech Recognition for Gujarati
  • Dec 13, 2021
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Deepang Raval + 3 more

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1109/tcbb.2021.3109557
GapPredict - A Language Model for Resolving Gaps in Draft Genome Assemblies.
  • Nov 1, 2021
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • Eric Chen + 4 more

Short-read DNA sequencing instruments can yield over 1012 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding assembled sequences using paired-end reads. However, unresolved sequences in these scaffolds appear as “gaps”. Here, we introduce GapPredict – An implementation of a proof of concept that uses a character-level language model to predict unresolved nucleotides in scaffold gaps. We benchmarked GapPredict against the state-of-the-art gap-filling tool Sealer, and observed that the former can fill 65.6% of the sampled gaps that were left unfilled by the latter with high similarity to the reference genome, demonstrating the practical utility of deep learning approaches to the gap-filling problem in genome assembly.

  • Research Article
  • Cite Count Icon 8
  • 10.5281/zenodo.6580098
Bicleaner at WMT 2020: Universitat d'Alacant-Prompsit's submission to the parallel corpus filtering shared task
  • Nov 1, 2020
  • Zenodo (CERN European Organization for Nuclear Research)
  • Miquel Esplà-Gomis + 3 more

This paper describes the joint submission of Universitat d’Alacant and Prompsit Language Engineering to the WMT 2020 shared task on parallel corpus filtering. Our submission, based on the free/open-source tool Bicleaner, enhances it with Extremely Randomised Trees and lexical similarity features that account for the frequency of the words in the parallel sentences to determine if two sentences are parallel. To train this classifier we used the clean corpora provided for the task and synthetic noisy parallel sentences. In addition we re-score the output of Bicleaner using character-level language models and n-gram saturation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 112
  • 10.1038/s41467-020-18959-8
Learning molecular dynamics with simple language model built upon long short-term memory neural network
  • Oct 9, 2020
  • Nature Communications
  • Sun-Ting Tsai + 2 more

Recurrent neural networks have led to breakthroughs in natural language processing and speech recognition. Here we show that recurrent networks, specifically long short-term memory networks can also capture the temporal evolution of chemical/biophysical trajectories. Our character-level language model learns a probabilistic model of 1-dimensional stochastic trajectories generated from higher-dimensional dynamics. The model captures Boltzmann statistics and also reproduces kinetics across a spectrum of timescales. We demonstrate how training the long short-term memory network is equivalent to learning a path entropy, and that its embedding layer, instead of representing contextual meaning of characters, here exhibits a nontrivial connectivity between different metastable states in the underlying physical system. We demonstrate our model’s reliability through different benchmark systems and a force spectroscopy trajectory for multi-state riboswitch. We anticipate that our work represents a stepping stone in the understanding and use of recurrent neural networks for understanding the dynamics of complex stochastic molecular systems.

  • Research Article
  • Cite Count Icon 1
  • 10.1609/aaai.v34i04.5958
Multi-Zone Unit for Recurrent Neural Networks
  • Apr 3, 2020
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Fandong Meng + 3 more

Recurrent neural networks (RNNs) have been widely used to deal with sequence learning problems. The input-dependent transition function, which folds new observations into hidden states to sequentially construct fixed-length representations of arbitrary-length sequences, plays a critical role in RNNs. Based on single space composition, transition functions in existing RNNs often have difficulty in capturing complicated long-range dependencies. In this paper, we introduce a new Multi-zone Unit (MZU) for RNNs. The key idea is to design a transition function that is capable of modeling multiple space composition. The MZU consists of three components: zone generation, zone composition, and zone aggregation. Experimental results on multiple datasets of the character-level language modeling task and the aspect-based sentiment analysis task demonstrate the superiority of the MZU.

  • Research Article
  • Cite Count Icon 3
  • 10.6919/icje.202001_6(1).0028
Named Entity Recognition Based on Character-level Language Models and Attention Mechanism
  • Jan 1, 2020
  • International Core Journal of Engineering
  • Chaoju Hu + 1 more

As a basic task in the field of natural language processing, named entity recognition plays an important role in text data processing tasks. Extracting features from the original text can be considered as the first step in the identification of named entities, but on this basic issue, traditional research still stays at the coarser granularity of words. Unlike traditional research, this paper focuses on finer granularity-character-level named entity recognition research. In order to fully extract the character-level feature representation from the character-level language model, this paper uses CNN and BiLSTM to perform feature extraction together, and introduces the attention mechanism to achieve more effective combination of character features and word features, then combines with BiLSTM-CRF to construct a complete end-to-end deep learning model (At- BiLSTM-CNNs-CRF). The experimental results show that its recognition ability exceeds most deep learning models.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.1162/tacl_a_00283
TabulaNearlyRasa:Probing the Linguistic Knowledge of Character-level Neural Language Models Trained on Unsegmented Text
  • Nov 1, 2019
  • Transactions of the Association for Computational Linguistics
  • Michael Hahn + 1 more

Recurrent neural networks (RNNs) have reached striking performance in many natural language processing tasks. This has renewed interest in whether these generic sequence processing devices are inducing genuine linguistic knowledge. Nearly all current analytical studies, however, initialize the RNNs with a vocabulary of known words, and feed them tokenized input during training. We present a multi-lingual study of the linguistic knowledge encoded in RNNs trained as character-level language models, on input data with word boundaries removed. These networks face a tougher and more cognitively realistic task, having to discover any useful linguistic unit from scratch based on input statistics. The results show that our “near tabula rasa” RNNs are mostly able to solve morphological, syntactic and semantic tasks that intuitively presuppose word-level knowledge, and indeed they learned, to some extent, to track word boundaries. Our study opens the door to speculations about the necessity of an explicit, rigid word lexicon in language learning and usage.

  • Research Article
  • Cite Count Icon 373
  • 10.1609/aaai.v33i01.33013159
Character-Level Language Modeling with Deeper Self-Attention
  • Jul 17, 2019
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Rami Al-Rfou + 4 more

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model (Vaswani et al. 2017) with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.

  • Book Chapter
  • Cite Count Icon 2
  • 10.3233/faia190328
Application of Character-Level Language Models in the Domain of Polish Statutory Law
  • Jan 1, 2019
  • Frontiers in artificial intelligence and applications
  • Smywiński-Pohl Aleksander + 3 more

Application of Character-Level Language Models in the Domain of Polish Statutory Law

  • Open Access Icon
  • Research Article
  • Cite Count Icon 46
  • 10.1016/j.patrec.2018.09.006
Dual Rectified Linear Units (DReLUs): A replacement for tanh activation functions in Quasi-Recurrent Neural Networks
  • Sep 5, 2018
  • Pattern Recognition Letters
  • Fréderic Godin + 3 more

Dual Rectified Linear Units (DReLUs): A replacement for tanh activation functions in Quasi-Recurrent Neural Networks

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.patrec.2018.06.023
Hierarchical recurrent highway networks
  • Jun 28, 2018
  • Pattern Recognition Letters
  • Tehseen Zia

Hierarchical recurrent highway networks

  • Research Article
  • Cite Count Icon 77
  • 10.1016/j.neucom.2018.03.020
Novel architecture for long short-term memory used in question classification
  • Mar 14, 2018
  • Neurocomputing
  • Wei Xia + 5 more

Novel architecture for long short-term memory used in question classification

  • Research Article
  • Cite Count Icon 25
  • 10.1121/1.4768800
Syllable language models for Mandarin speech recognition: Exploiting character language models
  • Jan 1, 2013
  • The Journal of the Acoustical Society of America
  • Xunying Liu + 3 more

Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.

  • 1
  • 1

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers