Optimization of tokenization and memory management for processing large textual corpora in multilingual applications
Optimization of tokenization and memory management in processing large datasets represents a key challenge in the contemporary development of language models. This paper focuses on enhancing the processing of large textual corpora in Serbian using the GPT-2 model, specifically adapted for transfer learning. Tokenization optimization was achieved by adding language-specific tokens for Serbian, while memory management was improved through advanced resource management methods during training. Key findings demonstrate significant memory consumption reduction and training process acceleration, enabling more efficient utilization of available computational resources. This research contributes to the development of language models tailored for the Serbian language and provides a foundation for further studies in the field of natural languag e processing (NLP). The implications of this work are multifaceted: it facilitates more efficient creation of NLP applications for Serbian-speaking regions, enhances the accuracy and performance of language models, and opens opportunities for applications across various domains, from automated translation to sentiment analysis. This study paves the way for future research focusing on additional optimization of language models, including adaptation for other languages with similar characteristics, as well as exploring new methods for even more efficient memory management during large-scale textual data processing.
- Conference Article
- 10.1117/12.420794
- Mar 29, 2001
MPEG-4 is a multimedia standard that requires Video Object Planes (VOPs). Generation of VOPs for any kind of video sequence is still a challenging problem that largely remains unsolved. Nevertheless, if this problem is treated by imposing certain constraints, solutions for specific application domains can be found. MPEG-4 applications in mobile devices is one such domain where the opposite goals namely low power and high throughput are required to be met. Efficient memory management plays a major role in reducing the power consumption. Specifically, efficient memory management for VOPs is difficult because the lifetimes of these objects vary and these life times may be overlapping. Varying life times of the objects requires dynamic memory management where memory fragmentation is a key problem that needs to be addressed. In general, memory management systems address this problem by following a combination of strategy, policy and mechanism. For MPEG4 based mobile devices that lack instruction processors, a hardware based memory management solution is necessary. In MPEG4 based mobile devices that have a RISC processor, using a Real time operating system (RTOS) for this memory management task is not expected to be efficient because the strategies and policies used by the ROTS is often tuned for handling memory segments of smaller sizes compared to object sizes. Hence, a memory management scheme specifically tuned for VOPs is important. In this paper, different strategies, policies and mechanisms for memory management are considered and an efficient combination is proposed for the case of VOP memory management along with a hardware architecture, which can handle the proposed combination.
- Research Article
2
- 10.9790/0661-0930105
- Jan 1, 2013
- IOSR Journal of Computer Engineering
In this paper, we present performance improvement techniques for data retrieval from customized data warehouses for efficient querying and Online Analytical Processing (OLAP) in relation to efficient database and memory management. Different database management techniques, e.g. indexing, partitioning etc. play vital role in efficient memory management. A comparison of data retrieval time for a particular query from a relational database as well as data warehouse database with and without indexing is performed. We show that the application of different database management techniques result faster query execution by reducing data retrieval time. This improved efficiency may increase the efficiency of OLAP operations, which results better data warehouse performance. Keywords - Data Warehouse, Indexing, OLAP, Partitioning, Querying.
- Conference Article
11
- 10.1109/sp46214.2022.9833613
- May 1, 2022
Existing tools for the automated detection of memory corruption bugs are not very effective in practice. They typically recognize only standard memory management (MM) APIs (e.g., malloc and free) and assume a naive paired-use model—an allocator is followed by a specific deallocator. However, we observe that programmers very often design their own MM functions and that these functions often manifest two major characteristics: (1) Custom allocator functions perform multi-object or nested allocation which then requires structure-aware deallocation functions. (2) Custom allocators and deallocators follow an unpaired-use model. A more effective detection thus needs to adapt those characteristics and capture memory bugs related to non-standard MM behaviors. In this paper, we present a MM function aware memory bug detection technique by introducing the concept of structure-aware and object-centric Memory Operation Synopsis (MOS). A MOS abstractly describes the memory objects of a given MM function, how they are managed by the function, and their structural relations. By utilizing MOS, a bug detection could explore much less code but is still capable of handling multi-object or nested allocations and does not rely on the paired-use model. In addition, to extensively find MM functions and automatically generate MOS for them, we propose a new identification approach that combines natural language processing (NLP) and data flow analysis, which enables the efficient and comprehensive identification of MM functions, even in very large code bases. We implement a MOS-enhanced memory bug detection system, Goshawk, to discover memory bugs caused by complex and custom MM behaviors. We applied Goshawk to well-tested and widely-used open source projects including OS kernels, server applications, and IoT SDKs. Goshawk outperforms the state-of-the-art data flow analysis driven bug detection tools by an order of magnitude in analysis speed and the number of accurately identified MM functions, reports the discovered bugs with a developer-friendly, MOS based description, and successfully detects 92 new double-free and use-after-free bugs.
- Preprint Article
- 10.7287/peerj.preprints.3344v1
- Oct 13, 2017
Nowadays Mobile phones are becoming more popular in our daily lives. Mobile technology has a great effect on human life. Our daily tasks are dependent on mobile devices. Memory Management (MM), Security and Performance plays an important role in every handheld device specially in mobile phones, which are very much dependent on their operating system (OS). These embedded operating systems are on the driving seat when we talk about efficient and useful memory management and secure handling. Three popular OS in mobile phones are Android, Windows and iOS (iPhone OS). Each OS has its own way of managing the memory and provide it to certain number of applications. Android is an open software available for the people to modify as per their needs. But Windows and iOS operating systems didn’t allow their software as open source. Researchers have done a large amount of work using different mechanisms and decision makings to develop new ways to manage the memory of these OS’s. This work shows a comparative analysis of different memory management and security related techniques in above three operating systems. In this paper, we present the analysis of memory management and security in mobile phone operating systems with respect to apps, main memory, cache memory and virtual memory. Also, we compare the overall performance of these OS’s in terms of MM, security concerns. This study will help in finding better operating system in terms of efficient memory management and security.
- Research Article
12
- 10.36676/jrps.v11.i4.1583
- Dec 31, 2020
- International Journal for Research Publication and Seminar
Optimizing data pipeline performance in modern GPU architectures is critical for achieving high computational throughput and efficient resource utilization in data-intensive applications. With the rise of deep learning, scientific simulations, and real-time analytics, GPUs have become integral in accelerating data processing tasks. However, ensuring optimal performance involves addressing several challenges, such as memory bandwidth limitations, data transfer bottlenecks between CPU and GPU, and efficient parallel execution of workloads. This research explores techniques for improving data pipeline performance by focusing on memory management, load balancing, and task scheduling. One key strategy is optimizing data movement through techniques like memory coalescing, which minimizes access latency, and overlapping data transfers with computation. Furthermore, leveraging the architectural advances in modern GPUs, such as unified memory and NVLink, can significantly reduce data transfer overhead. Task parallelism and efficient workload distribution across multiple GPU cores also play a crucial role in enhancing pipeline throughput. Additionally, the study highlights the importance of tuning GPU kernels and optimizing data preprocessing steps to ensure minimal latency and maximum throughput. By adopting advanced profiling tools and performance monitoring techniques, bottlenecks can be identified, and pipeline optimization strategies can be fine-tuned. The findings presented provide a comprehensive approach for designing and optimizing data pipelines, leading to significant performance improvements in GPU-based systems, ultimately driving the next generation of high-performance computing applications.
- Research Article
5
- 10.1145/74851.74867
- Nov 1, 1989
- ACM SIGOPS Operating Systems Review
The watermark-based lazy buddy system for dynamic memory management uses lazy coalescing rules controlled by watermark parameters to achieve low operational costs. The correctness of the watermark-based lazy buddy system is shown by defining a space of legal states called the lazy space and proving that the watermark-based lazy coalescing rules always keep the memory state within that space. In this paper we describe a different lazy coalescing policy, called the DELAY-2 algorithm, that focuses directly on keeping the memory state within the lazy space. The resulting implementation is simpler, and experimental data shows it to be up to 12% faster than the watermark-based buddy system and about 33% faster than the standard buddy system. Inexpensive operations make the DELAY-2 algorithm attractive as a memory manager for an operating system. The watermark-based lazy buddy policy offers fine control over the coalescing policy of the buddy system. However, applications such as the UNIX System kernel memory manager do not need such fine control. For these applications, the DELAY-2 buddy system provides an efficient memory manager with low operational costs and low request blocking probability. In the DELAY-2 buddy system, the worst-case time for a free operation is bounded by two coalescing delays per class, and when all blocks are returned to the system, the system memory is coalesced back to its original state. This ensures that the memory space can be completely shared.
- Conference Article
12
- 10.1109/cluster.2014.6968771
- Sep 1, 2014
Large-scale interactive applications and online analytic processing on graphs require fast data access to huge sets of small data objects. DXRAM addresses these challenges by keeping all data always in memory of potentially many nodes aggregated in a data center. In this paper we focus on the efficient memory management and mapping of global IDs to local memory addresses, which is not trivial as each node may store up to one billion of small data objects (16–64 byte) in its local memory. We present an efficient paging-like translation scheme for global IDs and a memory management optimized for many small data objects. The latter includes an efficient incremental defragmentation supporting changing allocation granularities for dynamic data. Our evaluations show that the proposed memory management approach has only a 4–5% overhead compared to state of the art memory allocators with around 20% and the paging-like mapping of globals IDs is faster and more efficient than hash-table based approaches. Furthermore, we compare memory overhead and read performance of DXRAM with RAMCloud.
- Research Article
- 10.52187/rdt.v7i1.378
- Feb 27, 2026
- Radiant
Efficient memory management is a critical aspect of operating system performance, as poor management can lead to high page fault rates, increased execution time, and reduced CPU utilization. This study examines the performance comparison of two widely used memory management strategies, First-In-First-Out (FIFO) and Least Recently Used (LRU), using simulations conducted through the CPU-OS Simulator v7.5.50. The objective is to observe differences in page fault rate, execution time, and CPU efficiency across various scenarios. Three experiments were conducted: first, the impact of cache/pipeline configuration; second, the influence of process scheduling on memory management; and third, the effect of memory size and page access patterns. The results show that LRU tends to provide a lower page fault rate under heavy workloads, while FIFO demonstrates advantages when memory is limited and overhead is minimal. This study contributes to understanding how page replacement algorithms affect system performance in an operating system simulation environment.
- Single Report
6
- 10.2172/992335
- May 1, 2010
The ubiquitous use of raw pointers in higher-level code is the primary cause of all memory usage problems and memory leaks in C++ programs. This paper describes what might be considered a radical approach to the problem which is to encapsulate the use of all raw pointers and all raw calls to new and delete in higher-level C++ code. Instead, a set of cooperating template classes developed in the Trilinos package Teuchos are used to encapsulate every use of raw C++ pointers in every use case where it appears in high-level code. Included in the set of memory management classes is the typical reference-counted smart pointer class similar to boost::shared ptr (and therefore C++0x std::shared ptr). However, what is missing in boost and the new standard library are non-reference counted classes for remaining use cases where raw C++ pointers would need to be used. These classes have a debug build mode where nearly all programmer errors are caught and gracefully reported at runtime. The default optimized build mode strips all runtime checks and allows the code to perform as efficiently as raw C++ pointers with reasonable usage. Also included is a novel approach for dealing with the circular references problem that imparts little extra overhead and is almost completely invisible to most of the code (unlike the boost and therefore C++0x approach). Rather than being a radical approach, encapsulating all raw C++ pointers is simply the logical progression of a trend in the C++ development and standards community that started with std::auto ptr and is continued (but not finished) with std::shared ptr in C++0x. Using the Teuchos reference-counted memory management classes allows one to remove unnecessary constraints in the use of objects by removing arbitrary lifetime ordering constraints which are a type of unnecessary coupling [23]. The code one writes with these classes will be more likely to be correct on first writing, will be less likely to contain silent (but deadly) memory usage errors, and will be much more robust to later refactoring and maintenance. The level of debug-mode runtime checking provided by the Teuchos memory management classes is stronger in many respects than what is provided by memory checking tools like Valgrind and Purify while being much less expensive. However, tools like Valgrind and Purify perform a number of types of checks (like usage of uninitialized memory) that makes these tools very valuable and therefore complement the Teuchos memory management debug-mode runtime checking. The Teuchos memory management classes and idioms largely address the technical issues in resolving the fragile built-in C++ memory management model (with the exception of circular references which has no easy solution but can be managed as discussed). All that remains is to teach these classes and idioms and expand their usage in C++ codes. The long-term viability of C++ as a usable and productive language depends on it. Otherwise, if C++ is no safer than C, then is the greater complexity of C++ worth what one gets as extra features? Given that C is smaller and easier to learn than C++ and since most programmers don't know object-orientation (or templates or X, Y, and Z features of C++) all that well anyway, then what really are most programmers getting extra out of C++ that would outweigh the extra complexity of C++ over C? C++ zealots will argue this point but the reality is that C++ popularity has peaked and is becoming less popular while the popularity of C has remained fairly stable over the last decade22. Idioms like are advocated in this paper can help to avert this trend but it will require wide community buy-in and a change in the way C++ is taught in order to have the greatest impact. To make these programs more secure, compiler vendors or static analysis tools (e.g. klocwork23) could implement a preprocessor-like language similar to OpenMP24 that would allow the programmer to declare (in comments) that certain blocks of code should be ''pointer-free'' or allow smaller blocks to be 'pointers allowed'. This would significantly improve the robustness of code that uses the memory management classes described here.
- Conference Article
16
- 10.1145/74850.74867
- Nov 1, 1989
The watermark-based lazy buddy system for dynamic memory management uses lazy coalescing rules controlled by watermark parameters to achieve low operational costs. The correctness of the watermark-based lazy buddy system is shown by defining a space of legal states called the lazy space and proving that the watermark-based lazy coalescing rules always keep the memory state within that space. In this paper we describe a different lazy coalescing policy, called the DELAY-2 algorithm, that focuses directly on keeping the memory state within the lazy space. The resulting implementation is simpler, and experimental data shows it to be up to 12% faster than the watermark-based buddy system and about 33% faster than the standard buddy system. Inexpensive operations make the DELAY-2 algorithm attractive as a memory manager for an operating system.
- Research Article
11
- 10.1145/276305.276346
- Jun 1, 1998
- ACM SIGMOD Record
If replacement selection is used in an external mergesort to generate initial runs, individual records are deleted and inserted in the sort operation's workspace. Variable-length records introduce the need for possibly complex memory management and extra copying of records. As a result, few systems employ replacement selection, even though it produces longer runs than commonly used algorithms. We experimentally compared several algorithms and variants for managing this workspace. We found that the simple best fit algorithm achieves memory utilization of 90% or better and run lengths over 1.8 times workspace size, with no extra copying of records and very little other overhead, for widely varying record sizes and for a wide range of memory sizes. Thus, replacement selection is a viable algorithm for commercial database systems, even for variable-length records. Efficient memory management also enables an external sort algorithm that degrades gracefully when its input is only slightly larger than or a small multiple of the available memory size. This is not the case with the usual implementations of external sorting, which incur I/O for the entire input even if it is as little as one record larger than memory. Thus, in some cases, our techniques may reduce I/O volume by a factor 10 compared to traditional database sort algorithms. Moreover, the gradual rather than abrupt growth in I/O volume for increasing input sizes significantly eases design and implementation of intra-query memory management policies.
- Book Chapter
1
- 10.1007/978-3-031-14627-5_47
- Jan 1, 2022
Cross-domain data fusion is becoming a key driver of the growth of numerous and diverse applications in the IoT era. We have previously proposed a new cross-domain data fusion platform, the Geo-Centric Information Platform (GCIP), which enables IoT data fusion in a geolocation-based edge network. GCIP dynamically produces Spatio-Temporal Content (STC) by combining cross-domain data in each geographic area and then delivers them to users. However, when a large amount of IoT data is required for STC creation, there is a heavy load on the GCIP network and computational resources. This paper introduces a network-wide pre-processing method. When multiple flows with different loads on network and computational resources arrive at an edge network, (1) throughput degradation (efficiency issues) and (2) inequity in resource allocation (fairness issues) may occur. In this paper, we propose a comprehensive resource allocation method for efficient and fair utilization of network and computational resources. Through the numerical verification, we have demonstrated that the proposed method successfully improves efficiency and fairness by 26% and 38%, respectively.
- Research Article
14
- 10.1162/coli_a_00420
- Dec 7, 2021
- Computational Linguistics
Natural Language Processing and Computational Linguistics
- Conference Article
7
- 10.1109/ijcnn.2006.247379
- Jan 1, 2006
This paper proposes an efficient human-like memory and memory management which utilizes Walsh-based distributed associative memory in reducing the computer storage and processing for pattern recognition. As a verification example, a memory storing 26 binary alphabet images takes only the physical space needed to store 8 patterns and yet capable of perfect recognition. Further, the experimental results show that the proposed memory management strategy can nicely deal with data transfer from short-term (working) memory to long-term memory.
- Conference Article
11
- 10.1145/2786805.2804431
- Aug 30, 2015
Recent work has shown that although programming languages enable source code to be rich and complex, most code tends to be repetitive and predictable. The use of natural language processing (NLP) techniques applied to source code such as n-gram language models show great promise in areas such as code completion, aiding impaired developers, and code search. In this paper, we address three questions related to different methods of constructing language models in an industrial context. Specifically, we ask: (1) Do application specific, but smaller language models perform better than language models across applications? (2) Are developer specific language models effective and do they differ depending on what parts of the codebase a developer is working in? (3) Finally, do language models change over time, i.e., does a language model from early development model change later on in development? The answers to these questions enable techniques that make use of programming language models in development to choose the model training corpus more effectively. We evaluate these questions by building 28 language models across developers, time periods, and applications within Microsoft Office and present the results in this paper. We find that developer and application specific language models perform better than models from the entire codebase, but that temporality has little to no effect on language model performance.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.