Automated training data generation for multiple DNNs in ECG analysis
Abstract Electrocardiograms (ECGs) exhibit diverse waveforms depending on the type of disease, patient age, and electrode positions, making automated analysis challenging. Although deep neural networks have been applied to ECG interpretation, their performance is highly dependent on extensive, high‐quality training data, which are difficult to obtain for rare or novel patterns. This study proposes an automatic retraining algorithm that enhances the accuracy of ECG waveform boundary detection by utilizing recognition scores (RS) and quantifies the consistency of predictions across six independently trained models within a multiple deep neural network (mDNN) framework. The initial mDNN was trained using ECG data from five healthy individuals. Using ECG data from 30 elderly patients, the algorithm identifies low‐RS cases, corrects errors, and generates refined training data for subsequent mDNN retraining. Evaluation with ECG data from 20 previously unseen patients showed that RS values for P and T waves nearly doubled. Furthermore, the similarity between mDNN‐derived ECG parameters and expert annotations increased by 29%–98%, depending on the specific parameter. The retrained mDNN also exhibited more consistent results than human experts when analyzing single‐patient data.
- Research Article
52
- 10.14778/2733004.2733082
- Aug 1, 2014
- Proceedings of the VLDB Endowment
Deep learning gains lots of attentions in recent years and is more and more important for mining values in big data. However, to make deep learning practical for a wide range of applications in Tencent Inc., three requirements must be considered: 1) Lots of computational power are required to train a practical model with tens of millions of parameters and billions of samples for products such as automatic speech recognition (ASR), and the number of parameters and training data is still growing. 2) The capability of training larger model is necessary for better model quality. 3) Easy to use frameworks are valuable to do many experiments to perform model selection, such as finding an appropriate optimization algorithm and tuning optimal hyper-parameters. To accelerate training, support large models, and make experiments easier, we built Mariana, the Tencent deep learning platform, which utilizes GPU and CPU cluster to train models parallelly with three frameworks: 1) a multi-GPU data parallelism framework for deep neural networks (DNNs). 2) a multi-GPU model parallelism and data parallelism framework for deep convolutional neural networks (CNNs). 3) a CPU cluster framework for large scale DNNs. Mariana also provides built-in algorithms and features to facilitate experiments. Mariana is in production usage for more than one year, achieves state-of-the-art acceleration performance, and plays a key role in training models and improving quality for automatic speech recognition and image recognition in Tencent WeChat, a mobile social platform, and for Ad click-through rate prediction (pCTR) in Tencent QQ, an instant messaging platform, and Tencent Qzone, a social networking service.
- Conference Article
- 10.1109/ijcnn55064.2022.9892528
- Jul 18, 2022
Deep neural networks (DNNs) often rely on massive labelled data for training, which is inaccessible in many applications. Data augmentation (DA) tackles data scarcity by creating new labelled data from available ones. Different DA methods have different mechanisms and therefore using their generated labelled data for DNN training may help improving DNN's generalisation to different degrees. Combining multiple DA methods, namely multi-DA, for DNN training, provides a way to further boost generalisation. Among existing multi-DA based DNN training methods, those relying on knowledge distillation (KD) have received great attention. They leverage knowledge transfer to utilise the labelled data sets created by multiple DA methods instead of directly combining them for training DNNs. However, existing KD-based methods can only utilise certain types of DA methods, incapable of making full use of the advantages of arbitrary DA methods. In this work, we propose a general multi-DA based DNN training framework capable to use arbitrary DA methods. To train a DNN, our framework replicates a certain portion in the latter part of the DNN into multiple copies, leading to multiple DNNs with shared blocks in their former parts and independent blocks in their latter parts. Each of these DNNs is associated with a unique DA and a newly devised loss that allows comprehensively learning from the data generated by all DA methods and the outputs from all DNNs in an online and adaptive way. The overall loss, i.e., the sum of each DNN's loss, is used for training the DNN. Eventually, one of the DNNs with the best validation performance is chosen for inference. We implement the proposed framework by using three distinct DA methods and apply it for training representative DNNs. Experimental results on the popular benchmarks of image classification demonstrate the superiority of our method to several existing single-DA and multi-DA based training methods.
- Research Article
8
- 10.1016/j.sysarc.2023.102888
- Apr 26, 2023
- Journal of Systems Architecture
Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs
- Research Article
27
- 10.1109/taslp.2015.2392944
- Apr 1, 2015
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
The hybrid deep neural network (DNN) and hidden Markov model (HMM) has recently achieved dramatic performance gains in automatic speech recognition (ASR). The DNN-based acoustic model is very powerful but its learning process is extremely time-consuming. In this paper, we propose a novel DNN-based acoustic modeling framework for speech recognition, where the posterior probabilities of HMM states are computed from multiple DNNs (mDNN), instead of a single large DNN, for the purpose of parallel training towards faster turnaround. In the proposed mDNN method all tied HMM states are first grouped into several disjoint clusters based on data-driven methods. Next, several hierarchically structured DNNs are trained separately in parallel for these clusters using multiple computing units (e.g. GPUs). In decoding, the posterior probabilities of HMM states can be calculated by combining outputs from multiple DNNs. In this work, we have shown that the training procedure of the mDNN under popular criteria, including both frame-level cross-entropy and sequence-level discriminative training, can be parallelized efficiently to yield significant speedup. The training speedup is mainly attributed to the fact that multiple DNNs are parallelized over multiple GPUs and each DNN is smaller in size and trained by only a subset of training data. We have evaluated the proposed mDNN method on a 64-hour Mandarin transcription task and the 320-hour Switchboard task. Compared to the conventional DNN, a 4-cluster mDNN model with similar size can yield comparable recognition performance in Switchboard (only about 2% performance degradation) with a greater than 7 times speed improvement in CE training and a 2.9 times improvement in sequence training, when 4 GPUs are used.
- Conference Article
1
- 10.1109/slt.2014.7078562
- Dec 1, 2014
Recent advancement in deep neural network (DNN) has surpassed the conventional hidden Markov model-Gaussian mixture model (HMM-GMM) framework due to its efficient training procedure. Providing better phonetic context information in the input gives improved performance for DNN. The state projection vectors (state specific vectors) in subspace Gaussian mixture model (SGMM) captures the phonetic information in low dimensional vector space. In this paper, we propose to use state specific vectors of SGMM as features thereby providing additional phonetic information for the DNN framework. To each observation vector in the train data, the corresponding state specific vectors of SGMM are aligned to form the state specific vector feature set. Linear discriminant analysis (LDA) feature set are formed by applying LDA to the training data. Since bottleneck features are efficient in extracting useful discriminative information for the phonemes, LDA feature set and state specific vector feature set are converted to bottleneck features. These bottleneck features of both feature sets act as input features to train a single DNN framework. Relative improvement of 8.8% for TIMIT database (core test set) and 9.7% for WSJ corpus is obtained by using the state specific vector bottleneck feature set when compared to the DNN trained only with LDA bottleneck feature set. Also training Deep belief network - DNN (DBN-DNN) using the proposed feature set attains a WER of 20.46% on TIMIT core test set proving the effectiveness of our method. The state specific vectors while acting as features, provide additional useful information related to phoneme variation. Thus by combining it with LDA bottleneck features improved performance is obtained using the DNN framework.
- Conference Article
- 10.1109/icsda.2016.7918983
- Oct 1, 2016
Recent advancements and efficient training procedures in deep neural networks (DNNs) have significantly outperformed the hidden Markov model-Gaussian mixture model (HMM-GMM). The performance of DNNs can further be improved should it be given better phonetic context information. This is manifested by state specific vectors (SSV) of subspace Gaussian mixture model (SGMM). In this paper, we use the state specific vectors of SGMM as features to provide additional phonetic context information to the DNN framework. The state specific vectors are aligned with each observation vector of the training data to form the state specific vector (SSV) feature set. The combination of linear discriminant analysis (LDA) feature sets and state specific feature sets are then used as input features to train the DNN framework. Relative improvement of up to 4.13% is obtained on Hindi database using DNN trained with a combination of state specific feature sets and LDA feature sets compared to the DNN trained only with LDA feature sets. Since state specific vectors provide extra information about the phonetic context, they show improved results when combined with DNN framework. In this paper, we also investigate the performance of speech recognition on different training data selection strategies. The idea is to implement an approach that maximizes the information content in the training corpus. The experiments in this paper are carried on the training data set having maximum information content.
- Conference Article
212
- 10.1145/3341301.3359630
- Oct 27, 2019
Existing deep neural network (DNN) frameworks optimize the computation graph of a DNN by applying graph transformations manually designed by human experts. This approach misses possible graph optimizations and is difficult to scale, as new DNN operators are introduced on a regular basis.
- Conference Article
17
- 10.1109/icassp.2014.6854680
- May 1, 2014
Recently, sequence level discriminative training methods have been proposed to fine-tune deep neural networks (DNN) after the framelevel cross entropy (CE) training to further improve recognition performance of DNNs. In our previous work, we have proposed a new cluster-based multiple DNNs structure and its parallel training algorithm based on the frame-level cross entropy criterion, which can significantly expedite CE training with multiple GPUs. In this paper, we extend to full sequence training for the multiple DNNs structure for better performance and meanwhile we also consider a partial parallel implementation of sequence training using multiple GPUs for faster training speed. In this work, it is shown that sequence training can be easily extended to multiple DNNs by slightly modifying error signals in output layer. Many implementation steps in sequence training of multiple DNNs can still be parallelized across multiple GPUs for better efficiency. Experiments on the Switchboard task have shown that both frame-level CE training and sequence training of multiple DNNs can lead to massive training speedup with little degradation in recognition performance. Comparing with the state-of-the-art DNN, 4-cluster multiple DNNs model with similar size can achieve more than 7 times faster in CE training and about 1.5 times faster in sequence training when using 4 GPUs.
- Conference Article
8
- 10.1109/cse/euc.2019.00013
- Aug 1, 2019
Current testing for Deep Neural Networks (DNNs) focuses on quantity of test cases but ignores diversity. To the best of our knowledge, DeepXplore is the first white-box framework for Deep Learning testing by triggering differential behaviors between multiple DNNs and increasing neuron coverage to improve diversity. Since it is based on multiple DNNs facing problems that (1) the framework is not friendly to a single DNN, (2) if incorrect predictions made by all DNNs simultaneously, DeepXplore cannot generate test cases. This paper presents Test4Deep, a white-box testing framework based on a single DNN. Test4Deep avoids mistakes of multiple DNNs by inducing inconsistencies between predicted labels of original inputs and that of generated test inputs. Meanwhile, Test4Deep improves neuron coverage to capture more diversity by attempting to activate more inactivated neurons. The proposed method was evaluated on three popular datasets with nine DNNs. Compared to DeepXplore, Test4Deep produced average 4.59% (maximum 10.49%) more test cases that all found errors and faults of DNNs. These test cases got 19.57% more diversity increment and 25.88% increment of neuron coverage. Test4Deep can further be used to improve the accuracy of DNNs by average up to 5.72% (maximum 7.0%).
- Conference Article
5
- 10.1109/iccd53106.2021.00088
- Oct 1, 2021
Deep neural networks (DNNs) have achieved remarkable success in many fields. Large-scale DNNs also bring storage challenges when storing snapshots for preventing clusters’ frequent failures, and bring massive internet traffic when dispatching or updating DNNs for resource-constrained devices (e.g., IoT devices, mobile phones). Several approaches are aiming to compress DNNs. The Recent work, Delta-DNN, notices high similarity existed in DNNs and thus calculates differences between them for improving the compression ratio.However, we observe that Delta-DNN, applying traditional global lossy quantization technique in calculating differences of two neighboring versions of the DNNs, can not fully exploit the data similarity between them for delta compression. This is because the parameters’ value ranges (and also the delta data in Delta-DNN) are varying among layers in DNNs, which inspires us to propose a local-sensitive quantization scheme: the quantizers are adaptive to parameters’ local value ranges in layers. Moreover, instead of quantizing differences of DNNs in Delta-DNN, our approach quantizes DNNs before calculating differences to make the differences more compressible. Besides, we also propose an error feedback mechanism to reduce DNNs’ accuracy loss caused by the lossy quantization.Therefore, we design a novel quantization-based delta compressor called QD-Compressor, which calculates the lossy differences between epochs of DNNs for saving storage cost of backing up DNNs’ snapshots and internet traffic of dispatching DNNs for resource-constrained devices. Experiments on several popular DNNs and datasets show that QD-Compressor obtains a compression ratio of 2.4× ~ 31.5× higher than the state-of-the-art approaches while well maintaining the model’s test accuracy.
- Research Article
2
- 10.1155/2024/7926619
- Jan 1, 2024
- Shock and Vibration
This study proposes an uncertainty quantification method based on deep neural networks and Catmull–Clark subdivision surfaces for vibroacoustic problems. The deep neural networks are utilized as a surrogate model to efficiently generate samples for stochastic analysis. The training data are obtained from numerical simulation by coupling the isogeometric finite element method and the isogeometric boundary element method. In the simulation, the geometric models are constructed with Catmull–Clark subdivision surfaces, and meantime, the physical fields are discretized with the same spline functions as used in geometric modelling. Multiple deep neural networks are trained to predict the sound pressure response for various parameters with different numbers and dimensions in vibroacoustic problems. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
- Research Article
13
- 10.1007/s11042-015-3038-y
- Nov 3, 2015
- Multimedia Tools and Applications
Constructing a mapping between articulatory movements and corresponding speech could significantly facilitate speech training and the development of speech aids for voice disorder patients. In this paper, we propose a novel deep learning framework for the creation of a bidirectional mapping between articulatory information and synchronized speech recorded using an ultrasound system. We created a dataset comprising six Chinese vowels and employed the Bimodal Deep Autoencoders algorithm based on the Restricted Boltzmann Machine (RBM) to learn the correlation between speech and ultrasound images of the tongue and the weight matrices of the data representations obtained. Speech and ultrasound images were then reconstructed from the extracted features. The reconstruction error of the ultrasound images created with our method was found to be less than that of the approach based on Principal Components Analysis (PCA). Further, the reconstructed speech approximated the original as the mean formants error (MFE) was small. Following acquisition of their shared representations using the RBM-based deep autoencoder, we carried out mapping between ultrasound images of the tongue and corresponding acoustics signals with a Deep Neural Network (DNN) framework using the revised Deep Denoising Autoencoders. The results obtained indicate that the performance of our proposed method is better than that of a Gaussian Mixture Model (GMM)-based method to which it was compared.
- Conference Article
19
- 10.1109/icassp.2013.6638948
- May 1, 2013
Recently a pre-trained context-dependent hybrid deep neural network (DNN) and HMM method has achieved significant performance gain in many large-scale automatic speech recognition (ASR) tasks. However, the error back-propagation (BP) algorithm for training neural networks is sequential in nature and is hard to parallelize into multiple computing threads. Therefore, training a deep neural network is extremely time-consuming even with a modern GPU board. In this paper we have proposed a new acoustic modelling framework to use multiple DNNs instead of a single DNN to compute the posterior probabilities of tied HMM states. In our method, all tied states of context-dependent HMMs are first grouped into several disjoined clusters based on the training data associated with these HMM states. Then, several hierarchically structured DNNs are trained separately for these disjoined clusters of data using multiple GPUs. In decoding, the final posterior probability of each tied HMM state can be calculated based on output posteriors from multiple DNNs. We have evaluated the proposed method on a 64-hour Mandarin transcription task and 309-hour Switchboard Hub5 task. Experimental results have shown that the new method using clusterbased multiple DNNs can achieve over 5 times reduction in total training time with only negligible performance degradation (about 1-2% in average) when using 3 or 4 GPUs respectively.
- Conference Article
28
- 10.1109/icassp.2018.8462649
- Apr 1, 2018
In this work, we present a variant of multiple deep neural network (DNN) based speech enhancement method. We directly estimate clean speech spectrum as a weighted average of outputs from multiple DNNs. The weights are provided by a gating network. The multiple DNNs and the gating network are trained jointly. The objective function is set as the mean square logarithmic error between the target clean spectrum and the estimated spectrum. We conduct experiments using two and four DNNs using the TIMIT corpus with nine noise types (four seen noises and five unseen noises) taken from the AURORA database at four different signal-to-noise ratios (SNRs). We also compare the proposed method with a single DNN based speech enhancement scheme and existing multiple DNN schemes using segmental SNR, perceptual evaluation of speech quality (PESQ) and short-term objective intelligibility (STOI) as the evaluation metrics. These comparisons show the superiority of proposed method over baseline schemes in both seen and unseen noises. Specifically, we observe an absolute improvement of 0.07 and 0.04 in PESQ measure compared to single DNN when averaged over all noises and SNRs for seen and unseen noise cases respectively.
- Research Article
16
- 10.2144/fsoa-2022-0010
- Mar 8, 2022
- Future science OA
Artificial intelligence in interdisciplinary life science and drug discovery research.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.