Abstract
Background and objectiveOne of the main problems with biomedical signals is the limited amount of patient-specific data and the significant amount of time needed to record the sufficient number of samples needed for diagnostic and treatment purposes. In this study, we present a framework to simultaneously generate and classify biomedical time series based on a modified Adversarial Autoencoder (AAE) algorithm and one-dimensional convolutions. Our work is based on breathing time series, with specific motivation to capture breathing motion during radiotherapy lung cancer treatments. MethodsFirst, we explore the potential in using the Variational Autoencoder (VAE) and AAE algorithms to model breathing signals from individual patients. We then extend the AAE algorithm to allow joint semi-supervised classification and generation of different types of signals within a single framework. To simplify the modeling task, we introduce a pre-processing and post-processing compressing algorithm that transforms the multi-dimensional time series into vectors containing time and position values, which are transformed back into time series through an additional neural network. ResultsThe resulting models are able to generate realistic and varied samples of breathing. By incorporating 4% and 12% of the labeled samples during training, our model outperforms other purely discriminative networks in classifying breathing baseline shift irregularities from a dataset completely different from the training set, achieving an average macro F1-score of 94.91% and 96.54%, respectively. ConclusionTo our knowledge, the presented framework is the first approach that unifies generation and classification within a single model for this type of biomedical data, enabling both computer aided diagnosis and augmentation of labeled samples within a single framework.
Highlights
Biomedical data is the driving force behind most modern advances in medicine
Since the 3D are correlated, the 3D signals are further compressed into a 1D signal by using Principal Component Analysis (PCA) and projecting them onto the main axis of movement, which is the eigenvector with highest eigenvalue
We investigate how the number of labeled examples used during the supervised phase of training affects the classification accuracy of the Semisupervised AAE (SAAE) by comparing its macro F1-score (mF1)-score to that of pure classifier networks
Summary
Biomedical data is the driving force behind most modern advances in medicine. The use of biomedical records is associated with a series of problems such as the lack of reliable models capable of simulating data with clinical precision, the absence of personalized models for diagnosis, or the lack of labeled samples since the labels containing personal features that compromise privacy or are not recorded [1]. Given a dataset D = {x(i)}Ni=D1 with ND independent and identically distributed (i.i.d) data points, the goal is to model a probability distribution pθ (x) that approximates the unknown true probability distribution generating the data using a probabilistic graphical model with parameters θ. Let this probabilistic model be a latent variable model, which conditions the observed variable x on the unobserved random variable z ∈ RN over the latent space Z containing. Point-estimates of the parameters θ of the latent variable model can be obtained via maximum likelihood estimation, i.e., by maximizing the (log-) marginal distribution of the observed data θ∗ = argmax log (pθ (x))
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.