Cognitive diagnosis method via neural networks with transfer learning and Q-matrix constraints
This study introduces Bi-QNN, a neural network-based cognitive diagnosis method with Q-matrix constraints and transfer learning, capable of adaptively modeling attribute effects. It outperforms GDINA and DINA in prediction accuracy, maintains robustness with increasing attribute numbers, and effectively handles limited labeled data, demonstrating broad applicability across simulated and real datasets.
摘要: 神经网络作为最重要的机器学习方法已被广泛地用于认知诊断, 但目前仍没有一种简单通用的神经网络认知诊断方法。因此, 提出一种Q矩阵约束的神经网络认知诊断方法(Bi-QNN), 并基于迁移学习进行训练。新模型的优势在于:(1)使用人员无需专门设计网络结构, 新模型可以根据Q矩阵与交互式Q矩阵自适应任意数据集; (2)网络结构的设计原理源于GDINA模型, 使其能够较好地表达属性的主效应与交互效应; (3)基于迁移学习的模型训练方案能有效地解决标记数据稀缺问题, 提高模型的易用性与适用范围。实验结果表明:Bi-QNN在模拟数据集上的预测误差整体上比参数化方法GDINA与DINA的表现更好; 在一定的范围内, 模型对属性数量敏感性相对较低, 当属性数量增加时在一定程度上仍能保持较好的分类准确率; 基于迁移学习训练的Bi-QNN方法能更好地适应不同样本量的数据集, 在模拟数据与实证数据的多种条件下保持对其它模型的领先; 模型性能的进一步提升受到基于参数模型的模拟数据的限制, 对试题质量仍有一定的敏感性。
- Single Book
1
- 10.61909/amkedtb022409
- Feb 27, 2024
“NEURAL NETWORKS AND DEEP LEARNING: THEORITICAL INSIGHTS AND FRAMEWORKS” is a comprehensive guide that dives deep into the world of neural networks and their applications in modern technology. From foundational theories to cutting-edge advancements, this book provides readers with a comprehensive understanding of deep learning and its potential impact on various fields. In Chapter 1: Introduction to Neural Networks and Deep Learning, readers are introduced to the theoretical underpinnings of deep learning and its real-world applications. The chapter explores key concepts, navigates through neural network architectures, and discusses the current landscape of deep learning research. It also addresses ethical considerations and social implications, highlighting the intersection of deep learning with other disciplines. Chapter 2: Mathematical Foundations of Neural Networks lays the groundwork by covering essential mathematical concepts relevant to deep learning. From linear algebra to calculus, probability, and statistics, readers gain insights into the mathematical rigor behind neural network operations. The chapter also delves into optimization techniques and advanced mathematical concepts crucial for understanding deep learning models. Chapter 3: Single-Layer Perceptrons and Feedforward Networks explores the building blocks of neural networks, including perceptrons and activation functions. It discusses universal approximation theorems, backpropagation algorithms, and weight initialization techniques. Additionally, the chapter addresses challenges such as vanishing and exploding gradient problems, along with evolutionary algorithms and self-organizing maps. Chapter 4: Convolutional Neural Networks (CNNs) focuses on specialized architectures designed for image processing tasks. Readers learn about convolutional layers, pooling operations, and hierarchical feature learning. The chapter also covers object localization, transfer learning, and interpretability of CNNs, along with advanced architectures like capsule networks. Chapter 5: Recurrent Neural Networks (RNNs) delves into sequential data processing, temporal dependencies, and architectures like Long Short-Term Memory (LSTM) networks and Gated Recurrent Unit (GRU) networks. It addresses training challenges and explores real-world applications of recurrent networks. Chapter 6: Generative Adversarial Networks (GANs) introduces readers to the innovative concept of GANs and their applications in image generation. The chapter discusses training dynamics, challenges, and ethical considerations surrounding GANs. It also explores future developments and applications in creativity and adversarial robustness. Chapter 7: Autoencoders and Variational Autoencoders (VAEs) explores unsupervised learning techniques for representation learning and anomaly detection. Readers learn about various types of autoencoders, including adversarial autoencoders and quantum autoencoders. Chapter 8: Reinforcement Learning and Deep Q Networks (DQNs) provides insights into reinforcement learning fundamentals, Markov decision processes, and deep Q networks. It discusses policy gradient methods and their applications in real-world scenarios. Chapter 9: Transfer Learning in Deep Neural Networks explores transfer learning paradigms, domain adaptation techniques, and the role of transfer learning in achieving explainable AI. Readers gain insights into evaluating performance and generalization in transfer learning, along with applications in various domains. “NEURAL NETWORKS AND DEEP LEARNING: THEORITICAL INSIGHTS AND FRAMEWORKS” is an invaluable resource for researchers, practitioners, and enthusiasts looking to deepen their understanding of neural networks and harness the power of deep learning in diverse applications.
- Research Article
2
- 10.3390/sym13081344
- Jul 26, 2021
- Symmetry
The plastic modifications in synaptic connectivity is primarily from changes triggered by neuromodulated dopamine signals. These activities are controlled by neuromodulation, which is itself under the control of the brain. The subjective brain’s self-modifying abilities play an essential role in learning and adaptation. The artificial neural networks with neuromodulated plasticity are used to implement transfer learning in the image classification domain. In particular, this has application in image detection, image segmentation, and transfer of learning parameters with significant results. This paper proposes a novel approach to enhance transfer learning accuracy in a heterogeneous source and target, using the neuromodulation of the Hebbian learning principle, called NDHTL (Neuromodulated Dopamine Hebbian Transfer Learning). Neuromodulation of plasticity offers a powerful new technique with applications in training neural networks implementing asymmetric backpropagation using Hebbian principles in transfer learning motivated CNNs (Convolutional neural networks). Biologically motivated concomitant learning, where connected brain cells activate positively, enhances the synaptic connection strength between the network neurons. Using the NDHTL algorithm, the percentage of change of the plasticity between the neurons of the CNN layer is directly managed by the dopamine signal’s value. The discriminative nature of transfer learning fits well with the technique. The learned model’s connection weights must adapt to unseen target datasets with the least cost and effort in transfer learning. Using distinctive learning principles such as dopamine Hebbian learning in transfer learning for asymmetric gradient weights update is a novel approach. The paper emphasizes the NDHTL algorithmic technique as synaptic plasticity controlled by dopamine signals in transfer learning to classify images using source-target datasets. The standard transfer learning using gradient backpropagation is a symmetric framework. Experimental results using CIFAR-10 and CIFAR-100 datasets show that the proposed NDHTL algorithm can enhance transfer learning efficiency compared to existing methods.
- Abstract
- 10.1016/j.jaac.2022.09.193
- Oct 1, 2022
- Journal of the American Academy of Child & Adolescent Psychiatry
2.49 Transfer Learning of Scanner-Generalization Neural Networks for Predicting General Psychopathology Factor (p factor) in Adolescents Based on Resting-State Functional Connectivity
- Research Article
30
- 10.1109/tase.2020.3003124
- Jun 30, 2020
- IEEE Transactions on Automation Science and Engineering
Deep neural networks (DNNs), e.g., convolutional neural network (CNN), are able to learn effective features from wafer maps for dimensional reduction and feature extraction. However, very large image data are needed to train DNNs to obtain high generalization performance. It is still a difficult task due to the lack of sufficient labeled images with various defects. This article proposes a semisupervised deep transfer learning algorithm called joint feature and label adversarial network (JFLAN). JFLAN uses CNNs to extract transferable features of wafer maps and then introduces a multilayer domain adaptation and pseudolabel learning block based on the generative adversarial network (GAN). This effectively reduces the distribution discrepancy and the among-class distance of the transferable features. Finally, JFLAN transfers knowledge from wafer image source data collected offline and then achieves the goal of significantly improved accuracy of wafer defect recognition and realizes online adaptive defect recognition. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —The defect recognition on wafer maps plays a key role to recognize fault sources of the semiconductor manufacturing processes. Transfer learning is able to use the existing labeled data to assist in the classification of unlabeled data and is very effective to solve the problem of small samples and nonstationary generalization errors. In particular, the infusion of adversarial learning in transfer learning will provide a new idea for deep feature learning. This article provides a novel method based on transfer learning to implement wafer map defect recognition (WMDR) to quickly identify defect root causes for yield enhancement. This article provides a novel way for quality control of semiconductor manufacturing processes based on transfer and adversarial learning.
- Research Article
- 10.2514/1.j065406
- Jul 1, 2025
- AIAA Journal
Accurate and efficient prediction of propeller acoustic performance is of paramount importance to the analysis and design of various urban air mobility concepts that have emerged over the last decade. Leveraging modern machine learning techniques of transfer learning (TL) and active learning (AL), we present a multifidelity data-driven framework that uses a combination of simulation and experiment data to train a neural network (NN) model for predicting the tonal noise of two-bladed propellers at multiple far-field observer locations given the airfoil profile, pitch-to-diameter ratio, forward speed, and rotational speed of the propeller. The NN model is first trained using a large number of inexpensive low-fidelity simulations and then enhanced by a small number of high-fidelity aeroacoustic wind tunnel measurements using TL. Two additional rounds of wind-tunnel experiments with new propeller designs are then suggested by an AL algorithm designed to effectively reduce the predictive error of the NN model. Results from held-out validation indicate that the NN trained with both low- and high-fidelity data based on TL delivers better performance than those trained solely with low- or high-fidelity data, and the inclusion of high-fidelity data selected by AL further improves the predictive accuracy.
- Supplementary Content
- 10.25394/pgs.12221597.v1
- Apr 30, 2020
- Figshare
Recent progress in machine learning has been mainly due to the availability of large amounts of annotated data used for training complex models with deep architectures. Annotating this training data becomes burdensome and creates a major bottleneck in maintaining machine-learning databases. Moreover, these trained models fail to generalize to new categories or new varieties of the same categories. This is because new categories or new varieties have data distribution different from the training data distribution. To tackle these problems, this thesis proposes to develop a family of transfer-learning techniques that can deal with different training (source) and testing (target) distributions with the assumption that the availability of annotated data is limited in the testing domain. This is done by using the auxiliary data-abundant source domain from which useful knowledge is transferred that can be applied to data-scarce target domain. This transferable knowledge serves as a prior that biases target-domain predictions and prevents the target-domain model from overfitting. Specifically, we explore structural priors that encode relational knowledge between different data entities, which provides more informative bias than traditional priors. The choice of the structural prior depends on the information availability and the similarity between the two domains. Depending on the domain similarity and the information availability, we divide the transfer learning problem into four major categories and propose different structural priors to solve each of these sub-problems. This thesis first focuses on the unsupervised-domain-adaptation problem, where we propose to minimize domain discrepancy by transforming labeled source-domain data to be close to unlabeled target-domain data. For this problem, the categories remain the same across the two domains and hence we assume that the structural relationship between the source-domain samples is carried over to the target domain. Thus, graph or hyper-graph is constructed as the structural prior from both domains and a graph/hyper-graph matching formulation is used to transform samples in the source domain to be closer to samples in the target domain. An efficient optimization scheme is then proposed to tackle the time and memory inefficiencies associated with the matching problem. The few-shot learning problem is studied next, where we propose to transfer knowledge from source-domain categories containing abundantly labeled data to novel categories in the target domain that contains only few labeled data. The knowledge transfer biases the novel category predictions and prevents the model from overfitting. The knowledge is encoded using a neural-network-based prior that transforms a data sample to its corresponding class prototype. This neural network is trained from the source-domain data and applied to the target-domain data, where it transforms the few-shot samples to the novel-class prototypes for better recognition performance. The few-shot learning problem is then extended to the situation, where we do not have access to the source-domain data but only have access to the source-domain class prototypes. In this limited information setting, parametric neural-network-based priors would overfit to the source-class prototypes and hence we seek a non-parametric-based prior using manifolds. A piecewise linear manifold is used as a structural prior to fit the source-domain-class prototypes. This structure is extended to the target domain, where the novel-class prototypes are found by projecting the few-shot samples onto the manifold. Finally, the zero-shot learning problem is addressed, which is an extreme case of the few-shot learning problem where we do not have any labeled data in the target domain. However, we have high-level information for both the source and target domain categories in the form of semantic descriptors. We learn the relation between the sample space and the semantic space, using a regularized neural network so that classification of the novel categories can be carried out in a common representation space. This same neural network is then used in the target domain to relate the two spaces. In case we want to generate data for the novel categories in the target domain, we can use a constrained generative adversarial network instead of a traditional neural network. Thus, we use structural priors like graphs, neural networks and manifolds to relate various data entities like samples, prototypes and semantics for these different transfer learning sub-problems. We explore additional post-processing steps like pseudo-labeling, domain adaptation and calibration and enforce algorithmic and architectural constraints to further improve recognition performance. Experimental results on standard transfer learning image recognition datasets produced competitive results with respect to previous work. Further experimentation and analyses of these methods provided better understanding of machine learning as well.
- Research Article
26
- 10.1002/mrm.28822
- May 18, 2021
- Magnetic Resonance in Medicine
Artificial neural networks show promising performance in automatic segmentation of cardiac MRI. However, training requires large amounts of annotated data and generalization to different vendors, field strengths, sequence parameters, and pathologies is limited. Transfer learning addresses this challenge, but specific recommendations regarding type and amount of data required is lacking. In this study, we assess data requirements for transfer learning to experimental cardiac MRI at 7T where the segmentation task can be challenging. In addition, we provide guidelines, tools, and annotated data to enable transfer learning approaches by other researchers and clinicians. A publicly available segmentation model was used to annotate a publicly available data set. This labeled data set was subsequently used to train a neural network for segmentation of left ventricle and myocardium in cardiac cine MRI. The network is used as starting point for transfer learning to 7T cine data of healthy volunteers (n = 22; 7873 images) by updating the pre-trained weights. Structured and random data subsets of different sizes were used to systematically assess data requirements for successful transfer learning. Inconsistencies in the publically available data set were corrected, labels created, and a neural network trained. On 7T cardiac cine images the model pre-trained on public imaging data, acquired at 1.5T and 3T, achieved DICELV = 0.835 and DICEMY = 0.670. Transfer learning using 7T cine data and ImageNet weight initialization improved model performance to DICELV = 0.900 and DICEMY = 0.791. Using only end-systolic and end-diastolic images reduced training data by 90%, with no negative impact on segmentation performance (DICELV = 0.908, DICEMY = 0.805). This work demonstrates and quantifies the benefits of transfer learning for cardiac cine image segmentation. We provide practical guidelines for researchers planning transfer learning projects in cardiac MRI and make data, models, and code publicly available.
- Research Article
3
- 10.1016/j.neucom.2024.128936
- Feb 1, 2025
- Neurocomputing
Physics embedded neural network: Novel data-free approach towards scientific computing and applications in transfer learning
- Book Chapter
7
- 10.1007/978-3-030-86271-8_43
- Jan 1, 2021
A new transfer learning strategy is proposed for classification in this work, based on fully connected neural networks. The transfer learning process consists in a training phase of the neural network on a source dataset. Then, the last two layers are retrained using a different small target dataset. Clustering techniques are also applied in order to determine the most suitable data to be used as target. A preliminary study has been conducted to train and test the transfer learning proposal on the classification problem of phenology forecasting, by using up to sixteen different parcels located in Spain. The results achieved are quite promising and encourage conducting further research in this field, having led to a \(7.65\%\) of improvement with respect to other three different strategies with both transfer and non-transfer learning models.
- Research Article
6
- 10.1021/acsomega.1c06805
- Mar 11, 2022
- ACS Omega
Recent advances in molecular machine learning, especially deep neural networks such as graph neural networks (GNNs), for predicting structure–activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks is limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to their inactive compounds. Overall, TAc achieves the best performance with an average ROC-AUC of 0.801; it significantly improves the ROC-AUC of 83% of target tasks with an average task-wise performance improvement of 7.102%, compared to the best baseline dmpna. Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods. In summary, TAc-fc is also found to be a strong model with competitive or even better performance than TAc on a notable number of target tasks.
- Research Article
169
- 10.1007/s42600-021-00132-9
- Apr 2, 2021
- Research on Biomedical Engineering
PurposeWe present image classifiers based on Dense Convolutional Networks and transfer learning to classify chest X-ray images according to three labels: COVID-19, pneumonia, and normal.MethodsWe fine-tuned neural networks pretrained on ImageNet and applied a twice transfer learning approach, using NIH ChestX-ray14 dataset as an intermediate step. We also suggested a novelty called output neuron keeping, which changes the twice transfer learning technique. In order to clarify the modus operandi of the models, we used Layer-wise Relevance Propagation (LRP) to generate heatmaps.ResultsWe were able to reach test accuracy of 100% on our test dataset. Twice transfer learning and output neuron keeping showed promising results improving performances, mainly in the beginning of the training process. Although LRP revealed that words on the X-rays can influence the networks’ predictions, we discovered this had only a very small effect on accuracy.ConclusionAlthough clinical studies and larger datasets are still needed to further ensure good generalization, the state-of-the-art performances we achieved show that, with the help of artificial intelligence, chest X-rays can become a cheap and accurate auxiliary method for COVID-19 diagnosis. Heatmaps generated by LRP improve the interpretability of the deep neural networks and indicate an analytical path for future research on diagnosis. Twice transfer learning with output neuron keeping improved DNN performance.
- Research Article
41
- 10.1007/s11356-019-06156-0
- Aug 13, 2019
- Environmental Science and Pollution Research
Neural network models have been used to predict chlorophyll-a concentration dynamics. However, as model generalization ability decreases, (i) the performance of the models gradually decreases over time; (ii) the accuracy and performance of the models need to be improved. In this study, Transfer learning (TL) is employed to optimize neural network models (including feedforward neural networks (FNN), recurrent neural networks (RNN) and long short-term memory (LTSM)) and overcome these problems. Models using TL are able to reduce the influence of mutable data distribution and enhance generalization ability. Thus, it can improve the accuracy of prediction and maintain high performance in long-term applications. Also, TL is compared with parameter norm penalties (PNP) and dropout-two other methods used to improve model generalization ability. In general, TL has a better prediction effect than PNP and dropout. All the models, including FNN with different architectures, RNN and LSTM, as well as models optimized by PNP, dropout, and TL, are applied to an estuary reservoir in eastern China to predict chlorophyll-a dynamics at 5-min intervals. According to the results of this study, (i) models with TL produce the best prediction results; (ii) the original models and the models with PNP and dropout lose their ability to predict within 3 months, while TL models retain a high prediction accuracy.
- Research Article
67
- 10.1615/int.j.uncertaintyquantification.2020033267
- Jan 1, 2020
- International Journal for Uncertainty Quantification
Due to their high degree of expressiveness, neural networks have recently been used as surrogate models for mapping inputs of an engineering system to outputs of interest. Once trained, neural networks are computationally inexpensive to evaluate and remove the need for repeated evaluations of computationally expensive models in uncertainty quantification applications. However, given the highly parameterized construction of neural networks, especially deep neural networks, accurate training often requires large amounts of simulation data that may not be available in the case of computationally expensive systems. In this paper, to alleviate this issue for uncertainty propagation, we explore the application of transfer learning techniques using training data generated from both high- and low-fidelity models. We explore two strategies for coupling these two datasets during the training procedure, namely, the standard transfer learning and the bi-fidelity-weighted learning. In the former approach, a neural network model mapping the inputs to the outputs of interest is trained based on the low-fidelity data. The high-fidelity data are then used to adapt the parameters of the upper layer(s) of the low-fidelity network, or train a simpler neural network to map the output of the low-fidelity network to that of the high-fidelity model. In the latter approach, the entire low-fidelity network parameters are updated using data generated via a Gaussian process model trained with a small high-fidelity dataset. The parameter updates are performed via a variant of stochastic gradient descent with learning rates given by the Gaussian process model. Using three numerical examples, we illustrate the utility of these bi-fidelity transfer learning methods where we focus on accuracy improvement achieved by transfer learning over standard training approaches.
- Conference Article
4
- 10.2118/204563-ms
- Dec 15, 2021
Transfer learning is a machine learning concept whereby the knowledge gained (e.g., a model developed) in one task can be transferred (applied) to solve a different but related task. In the context of unconventional reservoirs, the concept can be used to transfer a machine learning model that is learned from data in one field (or shale play) to another, thereby significantly reducing the data needs and efforts to build a new model from scratch. In this work, we study the feasibility of developing deep learning models that can capture and transfer common features in a rich dataset pertaining to a mature unconventional play to enable production prediction in a new unconventional play with limited available data. The focus in this work is on method development using simulated data that correspond to the Bakken and Eagle Ford Shale Plays as two different unconventional plays in the US. We use formation and completion parameter ranges that correspond to the Bakken play with their simulated production responses to explore different approaches for training neural network models that enable transfer learning to predict production responses of input parameters corresponding to the Eagle Ford play (previously unseen input parameters). We explore different schemes by accessing the internal components of the model to extrapolate and categorize salient features that are represented in the trained neural network. Ultimately, our goal is to use these new mechanisms to enable effective sharing and reuse of discovered features from one unconventional well to another. To extract salient trends from formation and completion input parameters and their corresponding simulated production responses, we use deep learning architectures that consist of convolutional encoder-decoder networks. The architecture is then trained with rich simulated data from one field to generate a robust mapping between the input and the output feature spaces. The "learned" parameters from this network can then be "transferred" to develop a different predictive model for another field that may lack sufficient historical data. The results show that using standard training approaches, a neural network model that is trained with sufficiently large data samples from Bakken could produce reliable prediction models for typical wells that may be found in that field. The same neural network, however, could not produce reliable predictions for a typical Eagle Ford well. Furthermore, we observe that a neural network trained with insufficient data samples from Eagle Ford produces a poor prediction model for typical wells that may be found in Eagle Ford. However, when extrapolated feature components of the Bakken neural network were integrated into the training process of the Eagle Ford neural network, the resulting predictions for typical Eagle Ford wells improved significantly. Moreover, we observe that the ability to transfer learning can improve when specialized training strategies are adopted to enable transfer learning. Using several numerical experiments, the paper presents and assesses various transfer learning strategies to predict the production performance of unconventional wells in a new area with limited information by integrating knowledge from more mature plays.
- Preprint Article
2
- 10.7490/f1000research.1117257.1
- Aug 1, 2019
- F1000Research
Antimicrobial resistance (AMR) is an important global health concern. Being able to predict AMR from genetic information would allow for more effective treatment of infections and reduce AMR accumulation in bacterial populations. Deep neural networks are a promising technique for AMR prediction; however, the curse of dimensionality and lack of large datasets makes training deep neural networks and achieving accurate results difficult. We show that transfer learning can be utilized to improve the effectiveness of deep neural networks for AMR prediction in Neisseria gonorrhoaea. In the best case, a neural network first trained to predict azithromycin resistance and then retrained to predict cefixime resistance was approximately 12 times more accurate and took one-quarter of the time to train than a similar neural network trained only to predict cefixime resistance. We also show that transfer learning can be used to improve the effectiveness of neural networks when very little training data is available. This work reduces the barrier to entry for deep learning based phenotype prediction studies because large datasets aren't required for neural network training; instead, an existing network can be downloaded to seed neural networks and make more effective use of available data. Finally, we show that this technique is not only effective for bacteria, but can be used for plant datasets as well.