Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Deep neural network (DNN) training is generally performed by cloud computing platforms. However, cloud-based training has several problems such as network bottleneck, server management cost, and privacy. To overcome these problems, one of the most promising solutions is distributed DNN model training which trains the model with not only high-performance servers but also low-end power-efficient mobile edge or user devices. However, due to the lack of a framework which can provide an optimal cluster configuration (i.e., determining which computing devices participate in DNN training tasks), it is difficult to perform efficient DNN model training considering DNN service providers' preferences such as training time or energy efficiency. In this paper, we introduce a novel framework for distributed DNN training that determines the best training cluster configuration with available heterogeneous computing resources. Our proposed framework utilizes pre-training with a small number of training steps and estimates training time, power, energy, and energy-delay product (EDP) for each possible training cluster configuration. Based on the estimated metrics, our framework performs DNN training for the remaining steps with the chosen best cluster configurations depending on DNN service providers' preferences. Our framework is implemented in TensorFlow and evaluated with three heterogeneous computing platforms and five widely used DNN models. According to our experimental results, in 76.67% of the cases, our framework chooses the best cluster configuration depending on DNN service providers' preferences with only a small training time overhead.

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.1109/access.2020.3038112
SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
  • Jan 1, 2020
  • IEEE Access
  • Shinyoung Ahn + 1 more

Distributed processing using high-performance computing resources is essential for developers to train large-scale deep neural networks (DNNs). The major impediment to distributed DNN training is the communication bottleneck during the parameter exchange among the distributed DNN training workers. The communication bottleneck increases training time and decreases the utilization of the computational resources. Our previous study, SoftMemoryBox (SMB1) presented considerably superior performance compared to message passing interface (MPI) in the parameter communication of distributed DNN training. However, SMB1 had disadvantages such as the limited scalability of the distributed DNN training due to the restricted communication bandwidth from a single memory server, inability to provide a synchronization function for the shared memory buffer, and low portability/usability as a consequence of the kernel-level implementation. This paper proposes a scalable, shared memory buffer framework, called SoftMemoryBox II (SMB2), which overcomes the shortcomings of SMB1. With SMB2, distributed training processes can easily share virtually unified shared memory buffers composed of memory segments provided from remote memory servers and can exchange DNN parameters at high speed through the shared memory buffer. The scalable communication bandwidth of the SMB2 framework facilitates the reduction of DNN distributed training times compared to SMB1. According to intensive evaluation results, the communication bandwidth of the proposed SMB2 is 6.3 times greater than that of SMB1 when the SMB2 framework is scaled out to use eight memory servers. Moreover, the training time of SMB2-based asynchronous distributed training of five DNN models is up to 2.4 times faster than SMB1-based training.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/tetci.2022.3220224
DLB: A Dynamic Load Balance Strategy for Distributed Training of Deep Neural Networks
  • Aug 1, 2023
  • IEEE Transactions on Emerging Topics in Computational Intelligence
  • Qing Ye + 4 more

Synchronous strategies with data parallelism are widely utilized in distributed training of Deep Neural Networks (DNNs), largely owing to their easy implementation yet promising performance. In these strategies, the workers with different computational capabilities need to wait for each other because of the essential gradient or weight synchronization. This will inevitably cause the high-performance workers to waste time waiting for the weak computational workers, which in turn results in the inefficiency of the cluster. In this paper, we propose a Dynamic Load Balance (DLB) strategy for the distributed training of DNNs to tackle this issue. Specifically, the performance of each worker is evaluated first based on the performance demonstration during the previous training epochs, and then the batch size and dataset partition are adaptively adjusted in consideration of the current performance of the workers. As a result, the waiting cost among the workers will be eliminated, thereby the utilization of the clusters is highly improved. Furthermore, the essential theoretical analysis has also been provided to justify the convergence of the proposed algorithm. Extensive experiments have been conducted on the CIFAR10 and CIFAR100 benchmark datasets with four state-of-the-art DNN models. The experimental results indicate that the proposed algorithm can significantly improve the utilization of the distributed cluster. In addition, the proposed algorithm can also prevent the load imbalance of the distributed DNN training from being affected by the disturbance and can be employed flexibly in conjunction with the other synchronous distributed DNN training methods.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/3613424.3623779
ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction
  • Oct 28, 2023
  • Vahid Janfaza + 3 more

Neural network training is inherently sequential where the layers finish the forward propagation in succession, followed by the calculation and back-propagation of gradients (based on a loss function) starting from the last layer. The sequential computations significantly slow down neural network training, especially the deeper ones. Prediction has been successfully used in many areas of computer architecture to speed up sequential processing. Therefore, we propose ADA-GP, which uses gradient prediction adaptively to speed up deep neural network (DNN) training while maintaining accuracy. ADA-GP works by incorporating a small neural network to predict gradients for different layers of a DNN model. ADA-GP uses a novel tensor reorganization method to make it feasible to predict a large number of gradients. ADA-GP alternates between DNN training using backpropagated gradients and DNN training using predicted gradients. ADA-GP adaptively adjusts when and for how long gradient prediction is used to strike a balance between accuracy and performance. Last but not least, we provide a detailed hardware extension in a typical DNN accelerator to realize the speed up potential from gradient prediction. Our extensive experiments with fifteen DNN models show that ADA-GP can achieve an average speed up of 1.47 × with similar or even higher accuracy than the baseline models. Moreover, it consumes, on average, 34% less energy due to reduced off-chip memory accesses compared to the baseline accelerator.

  • Research Article
  • Cite Count Icon 2
  • 10.14288/1.0380523
Priority-based parameter propagation for distributed deep neural network training
  • Jan 1, 2019
  • Open Collections
  • Anand Jayarajan

Data parallel training is commonly used for scaling distributed Deep Neural Network ( DNN ) training. However, the performance benefits are often limited by the communication-heavy parameter synchronization step. In this work, we take advantage of the domain specific knowledge of DNN training and overlap parameter synchronization with computation in order to improve the training performance. We make two key observations: (1) the optimal data representation granularity for the communication may differ from that used by the underlying DNN model implementation and (2) different parameters can afford different synchronization delays. Based on these observations, we propose a new synchronization mechanism called Priority-based Parameter Propagation (P3). P3 synchronizes parameters at a finer granularity and schedules data transmission in such a way that the training process incurs minimal communication delay. We show that P3 can improve the training throughput of ResNet-50, Sockeye and VGG-19 by as much as 25%, 38% and 66% respectively on clusters with realistic network bandwidth.

  • Research Article
  • Cite Count Icon 172
  • 10.1109/tai.2021.3067574
Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
  • May 4, 2021
  • IEEE Transactions on Artificial Intelligence
  • Edgar Galván + 1 more

A variety of methods have been applied to the architectural configuration and learning or training of artificial deep neural networks (DNN). These methods play a crucial role in the success or failure of the DNN for most problems and applications. Evolutionary algorithms (EAs) are gaining momentum as a computationally feasible method for the automated optimization of DNNs. Neuroevolution is a term, which describes these processes of automated configuration and training of DNNs using EAs. While many works exist in the literature, no comprehensive surveys currently exist focusing exclusively on the strengths and limitations of using neuroevolution approaches in DNNs. Absence of such surveys can lead to a disjointed and fragmented field preventing DNNs researchers potentially adopting neuroevolutionary methods in their own research, resulting in lost opportunities for wider application within real-world deep learning problems. This article presents a comprehensive survey, discussion, and evaluation of the state-of-the-art in using EAs for architectural configuration and training of DNNs. This article highlights the most pertinent current issues and challenges in neuroevolution and identifies multiple promising future research directions. <p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><i>Impact Statement—</i>The concept of deep learning originated from the study of artificial neural networks (ANNs). ANNs have achieved extraordinary results in a variety of diverse application areas. Numerous methods have been applied to the architectural configuration and learning or training of artificial DNN and these methods play a crucial role in the success or failure of the DNN for most problems and applications. Recently, EAs have been gaining momentum as a computationally feasible method (called neuroevolution) for the automated configuration and learning or training of DNNs. This article reviews over 170 recent scientific papers describing how major EAs paradigms are being applied by researchers to the configuration and optimization of multiple DNNs. By articulating a clear understanding of the context, state-of-the-art, and feasibility of Neuroevolution, researchers in AI, EAs, and DNN will benefit from this article. The impact of this article comes from contributing toward enhancing research capacity, knowledge, and skills for researchers currently working in neuroevolution and actively engaging those considering becoming involved in this area.

  • Conference Article
  • Cite Count Icon 11
  • 10.1145/3572848.3577391
Efficient All-Reduce for Distributed DNN Training in Optical Interconnect Systems
  • Feb 21, 2023
  • Fei Dai + 4 more

Communication efficiency plays an important role in accelerating the distributed training of Deep Neural Networks (DNN). All-reduce is the crucial communication primitive to reduce model parameters in distributed DNN training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs due to the low data bandwidth of the electrical interconnect systems. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect systems. WRHT can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. We further derive the required number of wavelengths, the minimum number of communication steps, and the communication time for the all-reduce operation on optical interconnect. The constraint of insertion loss is also considered in our analysis. Simulation results show that the communication time of all-reduce by WRHT is reduced by 80.81%, 64.36%, and 82.12%, respectively, compared with three traditional all-reduce algorithms according to our simulation results of an optical interconnect system. Our results also show that WRHT can reduce the communication time of all-reduce operation by 92.42% and 91.31% compared to two existing all-reduce algorithms running in the electrical interconnect system.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/access.2022.3184692
Scale-Train: A Scalable DNN Training Framework for a Heterogeneous GPU Cloud
  • Jan 1, 2022
  • IEEE Access
  • Kyeonglok Kim + 3 more

In order to cope with the growing scale of deep neural network (DNN) models and training data, the use of cloud computing for distributed DNN training is becoming increasingly popular. The amount of available resources in a cloud continuously changes according to users&#x2019; demands. Although distributed DNN training has a long execution time ranging from several hours to several days, existing frameworks cannot provide a dynamic scale function or have high scale in/out overhead. Therefore, it is difficult to achieve higher performance by adding graphics processing unit (GPU) nodes to a running training cluster, even when surplus GPU resources become available. In addition, the inability to dynamically reconfigure the training cluster prohibits the reform of the cluster topology when it was sub-optimally created. This paper proposes a dynamic scaling technique with which the dynamic addition and removal of new workers can be performed without suspending the ongoing training job. In addition, we propose a heterogeneity-aware straggler-proof technique so that, even when the performance of the GPUs in the cloud are uneven, a performance benefit can be guaranteed through the addition of the surplus resources. The proposed scheme improved throughput by up to a factor of 17.52 during scaling out the existing cluster of five workers to ten compared to the existing checkpoint-based scheme. Furthermore, training was continued at 95.52% of the maximum performance while being stopped for 841 seconds in Elastic Horovod, which supports dynamic scaling. Finally, even when GPUs of different performances were mixed, the error between the determined batch size and the optimal batch size was 3.37% on average.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tifs.2023.3273169
A Guessing Entropy-Based Framework for Deep Learning-Assisted Side-Channel Analysis
  • Jan 1, 2023
  • IEEE Transactions on Information Forensics and Security
  • Ziyue Zhang + 2 more

Recently deep-learning (DL) techniques have been widely adopted in side-channel power analysis. A DL-assisted SCA generally consists of two phases: a deep neural network (DNN) training phase and a follow-on attack phase using the trained DNN. However, currently the two phases are not well aligned, as there is no conclusion on what metric used in the training can result in the most effective attack in the second phase. When traditional loss functions such as negative log-likelihood (NLL) are used in training a DNN, the trained model does not yield optimal follow-on attack. Recently some information theoretical SCA leakage metrics are proposed, either as the validation metric to stop the DNN training with traditional loss functions, or as both the validation metric and the training loss function. None of those proposed metrics, however, directly measures the SCA effectiveness. We propose to conduct DNN training directly with a common SCA effectiveness metric, Guessing Entropy (GE). We overcome the prior practical difficulty of using GE in DNN training by utilizing the GEEA estimation algorithm introduced in CHES 2020. We show that using GEEA as either the validation metric or the loss function produces DNN models that lead to much more effective follow-on attacks. Our work consolidates the DL-assisted SCA framework with a consistent metric, which shows great potential to be adopted as the universal SCA-oriented DNN training framework.

  • Conference Article
  • Cite Count Icon 49
  • 10.1145/3337821.3337873
Cynthia
  • Aug 5, 2019
  • Haoyue Zheng + 4 more

It becomes an increasingly popular trend for deep neural networks with large-scale datasets to be trained in a distributed manner in the cloud. However, widely known as resource-intensive and time-consuming, distributed deep neural network (DDNN) training suffers from unpredictable performance in the cloud, due to the intricate factors of resource bottleneck, heterogeneity and the imbalance of computation and communication which eventually cause severe resource under-utilization. In this paper, we propose Cynthia, a cost-efficient cloud resource provisioning framework to provide predictable DDNN training performance and reduce the training budget. To explicitly explore the resource bottleneck and heterogeneity, Cynthia predicts the DDNN training time by leveraging a lightweight analytical performance model based on the resource consumption of workers and parameter servers. With an accurate performance prediction, Cynthia is able to optimally provision the cost-efficient cloud instances to jointly guarantee the training performance and minimize the training budget. We implement Cynthia on top of Kubernetes by launching a 56-docker cluster to train four representative DNN models. Extensive prototype experiments on Amazon EC2 demonstrate that Cynthia can provide predictable training performance while reducing the monetary cost for DDNN workloads by up to 50.6%, in comparison to state-of-the-art resource provisioning strategies, yet with acceptable runtime overhead.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/hipc56025.2022.00017
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters
  • Dec 1, 2022
  • Nawras Alnaasan + 4 more

Deep Learning (DL) has become a prominent machine learning technique due to the availability of efficient computational resources in the form of Graphics Processing Units (GPUs), large-scale datasets and a variety of models. The newer generation of GPUs are being designed with special emphasis on optimizing performance for DL applications. Also, the availability of easy-to-use DL frameworks—like PyTorch and TensorFlow— has enhanced productivity of domain experts to work on their custom DL applications from diverse domains. However, existing Deep Neural Network (DNN) training approaches may not fully utilize the newly emerging powerful GPUs like the NVIDIA A100—this is the primary issue that we address in this paper. Our motivating analyses show that the GPU utilization on NVIDIA A100 can be as low as 43% using traditional DNN training approaches for small-to-medium DL models and input data size. This paper proposes AccDP—a data-parallel distributed DNN training approach—to accelerate GPU-based DL applications. AccDP exploits the Message Passing Interface (MPI) communication library coupled with the NVIDIA’s Multi-Process Service (MPS) to increase the amount of work assigned to parallel GPUs resulting in higher utilization of compute resources. We evaluate our proposed design on different small-to-medium DL models and input sizes on the state-of-the-art HPC clusters. By injecting more parallelism into DNN training using our approach, the evaluation shows up to 58% improvement in training performance on a single GPU and up to 62% on 16 GPUs compared to regular DNN training. Furthermore, we conduct an in-depth characterization to determine the impact of several DNN training factors and best practices—including the batch size and the number of data loading workers— to optimally utilize GPU devices. To the best of our knowledge, this is the first work that explores the use of MPS and MPI to maximize the utilization of GPUs in distributed DNN training.

  • Conference Article
  • Cite Count Icon 36
  • 10.1145/3477132.3483553
Gradient Compression Supercharged High-Performance Data Parallel DNN Training
  • Oct 26, 2021
  • Youhui Bai + 7 more

Gradient compression is a promising approach to alleviating the communication bottleneck in data parallel deep neural network (DNN) training by significantly reducing the data volume of gradients for synchronization. While gradient compression is being actively adopted by the industry (e.g., Facebook and AWS), our study reveals that there are two critical but often overlooked challenges: 1) inefficient coordination between compression and communication during gradient synchronization incurs substantial overheads, and 2) developing, optimizing, and integrating gradient compression algorithms into DNN systems imposes heavy burdens on DNN practitioners, and ad-hoc compression implementations often yield surprisingly poor system performance. In this paper, we first propose a compression-aware gradient synchronization architecture, CaSync, which relies on a flexible composition of basic computing and communication primitives. It is general and compatible with any gradient compression algorithms and gradient synchronization strategies, and enables high-performance computation-communication pipelining. We further introduce a gradient compression toolkit, CompLL, to enable efficient development and automated integration of on-GPU compression algorithms into DNN systems with little programming burden. Lastly, we build a compression-aware DNN training framework HiPress with CaSync and CompLL. HiPress is open-sourced and runs on mainstream DNN systems such as MXNet, TensorFlow, and PyTorch. Evaluation via a 16-node cluster with 128 NVIDIA V100 GPUs and 100Gbps network shows that HiPress improves the training speed over current compression-enabled systems (e.g., BytePS-onebit and Ring-DGC) by 17.2%-69.5% across six popular DNN models.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/tpds.2023.3266246
A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training
  • Aug 1, 2025
  • IEEE Transactions on Parallel and Distributed Systems
  • Hao Wu + 8 more

Gradient compression is a promising approach to alleviating the communication bottleneck in data parallel deep neural network (DNN) training by significantly reducing the data volume of gradients for synchronization. While gradient compression is being actively adopted by the industry (e.g., Facebook and AWS), our study reveals that there are two critical but often overlooked challenges: 1) inefficient coordination between compression and communication during gradient synchronization incurs substantial overheads, and 2) developing, optimizing, and integrating gradient compression algorithms into DNN systems imposes heavy burdens on DNN practitioners, and ad-hoc compression implementations often yield surprisingly poor system performance. In this paper, we propose a compression-aware gradient synchronization architecture, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CaSync</monospace> , which relies on flexible composition of basic computing and communication primitives. It is general and compatible with any gradient compression algorithms and gradient synchronization strategies and enables high-performance computation-communication pipelining. We further introduce a gradient compression toolkit, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CompLL</monospace> , to enable efficient development and automated integration of on-GPU compression algorithms into DNN systems with little programming burden. Lastly, we build a compression-aware DNN training framework <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiPress</monospace> with <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CaSync</monospace> and <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CompLL</monospace> . <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiPress</monospace> is open-sourced and runs on mainstream DNN systems such as MXNet, TensorFlow, and PyTorch. Evaluation via a 16-node cluster with 128 NVIDIA V100 GPUs and a 100 Gbps network shows that <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiPress</monospace> improves the training speed over current compression-enabled systems (e.g., BytePS-onebit, Ring-DGC and PyTorch-PowerSGD) by 9.8%-69.5% across six popular DNN models.

  • Research Article
  • 10.3390/electronics14152979
SVRG-AALR: Stochastic Variance-Reduced Gradient Method with Adaptive Alternating Learning Rate for Training Deep Neural Networks
  • Jul 25, 2025
  • Electronics
  • Shiyun Zou + 3 more

The stochastic variance-reduced gradient (SVRG) theory is particularly well-suited for addressing gradient variance in deep neural network (DNN) training; however, its direct application to DNN training is hindered by adaptation challenges. To tackle this issue, the present paper proposes a series of strategies focused on adaptive alternating learning rates to effectively adapt SVRG for DNN training. Firstly, within the outer loop of SVRG, both the full gradient and the learning rate specific to DNN training are computed. For two distinct formulas used for calculating the learning rate, an alternating strategy is introduced that employs them alternately across iterations. This approach allows for simultaneous provision of diverse guidance information regarding parameter change rates and gradient change rates during DNN weight updates. Additionally, a threshold method is utilized to correct the learning rate into an appropriate range, thereby accelerating convergence. Secondly, in the inner loop of SVRG, DNN weights are updated using mini-batch average gradient along with the proposed learning rate. Concurrently, mini-batch average gradients from each iteration within the inner loop are refined and aggregated into a single gradient exhibiting reduced variance through an inertia strategy. This refined gradient is then relayed back to the outer loop to recalculate the new learning rate. The efficacy of the proposed algorithm has been validated on models including LeNet, VGG11, ResNet34, and DenseNet121 while being compared against several classic and advanced optimizers. Experimental results demonstrate that the proposed algorithm exhibits remarkable training robustness across DNN models with diverse characteristics. In terms of training convergence, the proposed algorithm demonstrates competitiveness with state-of-the-art algorithms, such as Lion, developed by the Google Brain team.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.neucom.2023.126661
PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters
  • Aug 4, 2023
  • Neurocomputing
  • Jinghui Zhang + 6 more

PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters

  • Research Article
  • Cite Count Icon 3
  • 10.1109/access.2022.3213734
AccelAT: A Framework for Accelerating the Adversarial Training of Deep Neural Networks Through Accuracy Gradient
  • Jan 1, 2022
  • IEEE Access
  • Farzad Nikfam + 3 more

Adversarial training is exploited to develop a robust Deep Neural Network (DNN) model against the malicious altered data. These attacks may have catastrophic effects on DNN models but are indistinguishable for a human being. For example, an external attack can modify an image adding noises invisible for a human eye, but a DNN model misclassified the image. A key objective for developing robust DNN models is to use a learning algorithm that is fast but can also give model that is robust against different types of adversarial attacks. Especially for adversarial training, enormously long training times are needed for obtaining high accuracy under many different types of adversarial samples generated using different adversarial attack techniques. This paper aims at accelerating the adversarial training to enable fast development of robust DNN models against adversarial attacks. The general method for improving the training performance is the hyperparameters fine-tuning, where the learning rate is one of the most crucial hyperparameters. By modifying its shape (the value over time) and value during the training, we can obtain a model robust to adversarial attacks faster than standard training. First, we conduct experiments on two different datasets (CIFAR10, CIFAR100), exploring various techniques. Then, this analysis is leveraged to develop a novel fast training methodology, AccelAT, which automatically adjusts the learning rate for different epochs based on the accuracy gradient. The experiments show comparable results with the related works, and in several experiments, the adversarial training of DNNs using our AccelAT framework is conducted up to 2 times faster than the existing techniques. Thus, our findings boost the speed of adversarial training in an era in which security and performance are fundamental optimization objectives in DNN-based applications.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant