Sensor-Fusion for Smartphone Location Tracking Using Hybrid Multimodal Deep Neural Networks

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Many engineered approaches have been proposed over the years for solving the hard problem of performing indoor localization using smartphone sensors. However, specialising these solutions for difficult edge cases remains challenging. Here we propose an end-to-end hybrid multimodal deep neural network localization system, MM-Loc, relying on zero hand-engineered features, but learning automatically from data instead. This is achieved by using modality-specific neural networks to extract preliminary features from each sensing modality, which are then combined by cross-modality neural structures. We show that our choice of modality-specific neural architectures can estimate the location independently. But for better accuracy, a multimodal neural network that fuses the features of early modality-specific representations is a better proposition. Our proposed MM-Loc system is tested on cross-modality samples characterised by different sampling rate and data representation (inertial sensors, magnetic and WiFi signals), outperforming traditional approaches for location estimation. MM-Loc elegantly trains directly from data unlike conventional indoor positioning systems, which rely on human intuition.

Similar Papers
  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.xops.2025.100703
Utilization of Image-Based Deep Learning in Multimodal Glaucoma Detection Neural Network from a Primary Patient Cohort.
  • May 1, 2025
  • Ophthalmology science
  • Elizabeth E Hwang + 4 more

Utilization of Image-Based Deep Learning in Multimodal Glaucoma Detection Neural Network from a Primary Patient Cohort.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-319-43506-0_52
Multimodal Recurrent Neural Network (MRNN) Based Self Balancing System: Applied into Two-Wheeled Robot
  • Jan 1, 2016
  • Azhar Aulia Saputra + 2 more

Biologically inspired control system is necessary to be increased. This paper proposed the new design of multimodal neural network inspired from human learning system which takes different action in different condition. The multimodal neural network consists of some recurrent neural networks (RNNs) those are separated into different condition. There is selector system that decides certain RNN system depending the current condition of the robot. In this paper, we implemented this system in pendulum mobile robot as the basic object of study. Several certain number of RNNs are implemented into certain different condition of tilt robot. RNN works alternately depending on the condition of robot. In order to prove the effectiveness of the proposed model, we simulated in the computer simulation Open Dynamic Engine (ODE) and compared with ordinary RNN. The proposed neural model successfully stabilize the applied robot (2-wheeled robot). This model is developed for implemented into humanoid balancing learning system as the final object of study.

  • Research Article
  • 10.1200/jco.2022.40.16_suppl.e13572
Drug response prediction in patient-derived xenografts with data augmentation and multimodal deep learning.
  • Jun 1, 2022
  • Journal of Clinical Oncology
  • Alexander Partin + 11 more

e13572 Background: Prediction of drug response is a critical research area in precision oncology and has been previously explored with large drug screening studies of cancer cell lines (CCLs). Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies because the in vivo environment of PDXs helps preserve tumor heterogeneity and usually better mimics drug response of patients with cancer compared to CCLs. Methods: We investigate multimodal neural network (NN) and data augmentation for drug response prediction in PDXs. The multimodal NN learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs) where the multi-modality refers to tumor features only. The NN uses late integration where separate subnetworks are used to encode the input feature types before concatenation and prediction layers. Median tumor volume per treatment group is assessed relative to the control group to create a binary variable representing response. The data include twelve single-drug and 36 drug-pair treatments resulting in 2,556 single-drug and 2,203 drug-pair response values. Pathology and omics data from 487 PDXs from NCI's Patient Derived Models Repository are used as tumor feature model inputs. We explore whether the integration of WSIs with GE improves predictions as compared with models that use GE alone. We use two methods to address the limited number of response values in the dataset: 1) homogenize drug representations which allows to combine single-drug and drug-pairs into a single dataset, 2) augment drug-pair samples by switching the order of drug features which doubles the sample size of all drug-pair samples. These methods enable us to combine single-drug and drug-pair treatments which results in 6,962 responses, allowing us to train multimodal and unimodal NNs without changing architectures or the dataset. Results: Prediction performance of three unimodal NNs which use GE (um1, um2, and um3) are compared to assess the contribution of data augmentation methods. NN um1 that uses the full dataset which includes the original and the augmented drug-pair treatments as well as single-drug treatments significantly outperforms NNs (p-values < 0.01) that ignore either the augmented drug-pairs (um2) or the single-drug treatments (um3). In assessing the contribution of multimodal learning, results show that the multimodal NN (mm) outperforms both unimodal NNs that ignore either the GE (um4) or the WSIs (um1). However, the improvement of mm over um1 is not statistically significant (p-value < 0.26). Conclusions: Our results show that data augmentation and integration of histology images and GE can help improve prediction performance of drug response in PDXs.[Table: see text]

  • Research Article
  • Cite Count Icon 2
  • 10.1155/2022/5156532
Automatic Image Processing Algorithm for Light Environment Optimization Based on Multimodal Neural Network Model
  • Jun 3, 2022
  • Computational Intelligence and Neuroscience
  • Mujun Chen

In this paper, we conduct an in-depth study and analysis of the automatic image processing algorithm based on a multimodal Recurrent Neural Network (m-RNN) for light environment optimization. By analyzing the structure of m-RNN and combining the current research frontiers of image processing and natural language processing, we find out the problem of the ineffectiveness of m-RNN for some image generation descriptions, starting from both the image feature extraction part and text sequence data processing. Unlike traditional image automatic processing algorithms, this algorithm does not need to add complex rules manually. Still, it evaluates and filters through the training image collection and finally generates image automatic processing models by m-RNN. An image semantic segmentation algorithm is proposed based on multimodal attention and adaptive feature fusion. The main idea of the algorithm is to combine adaptive and feature fusion and then introduce data enhancement for small-scale multimodal light environment datasets by extracting the importance between images through multimodal attention. The model proposed in this paper can span the semantic differences of different modalities and construct feature relationships between different modalities to achieve an inferable, interpretable, and scalable feature representation of multimodal data. The automatic processing of light environment images using multimodal neural networks based on traditional algorithms eliminates manual processing and greatly reduces the time and effort of image processing.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1155/2021/3801675
Classification of Electrocardiogram of Congenital Heart Disease Patients by Neural Network Algorithms
  • Aug 31, 2021
  • Scientific Programming
  • Yongjie Yuan + 3 more

The study intended to explore the effect of different neural network algorithms in the electrocardiogram (ECG) classification of patients with congenital heart disease (CHD). Based on the single convolutional neural network (CNN) ECG algorithm and the recurrent neural network (RNN) ECG algorithm, a multimodal neural network (MNN) ECG algorithm was constructed utilizing the MIT-BIH database as training set and test set. Furthermore, the MNN ECG algorithm was optimized to establish an improved MNN (IMNN) algorithm, which was applied to the diagnosis of CHD patients. The CHD patients admitted between August 2016 and August 2019 were selected for analysis to compare the classification effect and accuracy rate of IMNN, MNN, CNN ECG, and RNN ECG algorithms. It was found that the RNN ECG algorithm had higher classification sensitivity and true positive rate in terms of normal or bundle (NB) branch block beat, supraventricular abnormal (SA) rhythm, abnormal ventricular (AV) beat, and fusion beat (FB) than the CNN ECG algorithm ( P < 0.05 ), and the classification sensitivity and true positive rate of IMNN algorithm in the four aspects were significantly higher than those of MNN algorithm ( P < 0.05 ). The classification accuracy of CNN ECG algorithm and RNN ECG algorithm was above 98%, while that of MNN algorithm and IMNN algorithm was better than that of CNN ECG algorithm and RNN ECG algorithm, and the accuracy rate can reach 98.5% or more. Moreover, the accuracy rate of the IMNN algorithm can reach more than 98%. In conclusion, IMNN not only has a good classification ability in the simulated environment but also performs well in the actual environment, which is worthy of clinical promotion.

  • Research Article
  • Cite Count Icon 3
  • 10.21638/spbu10.2024.208
Multimodal ensemble neural network system for skin cancer detection on heterogeneous dermatological data
  • Jan 1, 2024
  • Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes
  • Ulyana A Lyakhova + 1 more

Today, skin cancer is one of the leading causes of death in the world. Diagnosing skin cancer early is critical to increasing potential survival. Therefore, it is relevant to develop highprecision intelligent auxiliary diagnostic systems for detecting skin cancer in the early stages. Ensemble learning is one of the current and promising methods for increasing the accuracy of intelligent classification systems by reducing the dispersion and variability of predictions of individual components of the overall system. The work proposes an ensemble intelligent system for analyzing heterogeneous dermatological data based on multimodal neural networks. The accuracy of the developed ensemble system was 85.92 %, which is 1.85 percentage points higher than the average accuracy of individual multimodal architectures for classifying heterogeneous dermatological data. The developed system can be used as a high-precision auxiliary diagnostic tool to help make a medical decision, which will increase the chance of early detection of pigmented oncological pathologies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/cancers14071819
System for the Recognizing of Pigmented Skin Lesions with Fusion and Analysis of Heterogeneous Data Based on a Multimodal Neural Network
  • Apr 3, 2022
  • Cancers
  • Pavel Alekseevich Lyakhov + 2 more

Simple SummarySkin cancer is one of the most common cancers in humans. This study aims to create a system for recognizing pigmented skin lesions by analyzing heterogeneous data based on a multimodal neural network. Fusing patient statistics and multidimensional visual data allows for finding additional links between dermoscopic images and medical diagnostic results, significantly improving neural network classification accuracy. The use by specialists of the proposed system of neural network recognition of pigmented skin lesions will enhance the efficiency of diagnosis compared to visual diagnostic methods.Today, skin cancer is one of the most common malignant neoplasms in the human body. Diagnosis of pigmented lesions is challenging even for experienced dermatologists due to the wide range of morphological manifestations. Artificial intelligence technologies are capable of equaling and even surpassing the capabilities of a dermatologist in terms of efficiency. The main problem of implementing intellectual analysis systems is low accuracy. One of the possible ways to increase this indicator is using stages of preliminary processing of visual data and the use of heterogeneous data. The article proposes a multimodal neural network system for identifying pigmented skin lesions with a preliminary identification, and removing hair from dermatoscopic images. The novelty of the proposed system lies in the joint use of the stage of preliminary cleaning of hair structures and a multimodal neural network system for the analysis of heterogeneous data. The accuracy of pigmented skin lesions recognition in 10 diagnostically significant categories in the proposed system was 83.6%. The use of the proposed system by dermatologists as an auxiliary diagnostic method will minimize the impact of the human factor, assist in making medical decisions, and expand the possibilities of early detection of skin cancer.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/ijcnn.2002.1005483
Prediction of protein secondary structure by multi-modal neural networks
  • Aug 7, 2002
  • H Zhu + 2 more

We developed a multi-modal feed-forward neural network to predict the secondary structure of proteins. Several neural networks are used together and the final prediction results are decided by majority rule. We used 6137 residues to train and test the method. The average accuracy of the prediction is 66%, which is about 6.9% higher than single neural network.

  • Research Article
  • Cite Count Icon 10
  • 10.1038/s41698-024-00695-7
A multimodal neural network with gradient blending improves predictions of survival and metastasis in sarcoma
  • Sep 5, 2024
  • npj Precision Oncology
  • Anthony Bozzo + 9 more

The objective of this study is to develop a multimodal neural network (MMNN) model that analyzes clinical variables and MRI images of a soft tissue sarcoma (STS) patient, to predict overall survival and risk of distant metastases. We compare the performance of this MMNN to models based on clinical variables alone, radiomics models, and an unimodal neural network. We include patients aged 18 or older with biopsy-proven STS who underwent primary resection between January 1st, 2005, and December 31st, 2020 with complete outcome data and a pre-treatment MRI with both a T1 post-contrast sequence and a T2 fat-sat sequence available. A total of 9380 MRI slices containing sarcomas from 287 patients are available. Our MMNN accepts the entire 3D sarcoma volume from T1 and T2 MRIs and clinical variables. Gradient blending allows the clinical and image sub-networks to optimally converge without overfitting. Heat maps were generated to visualize the salient image features. Our MMNN outperformed all other models in predicting overall survival and the risk of distant metastases. The C-Index of our MMNN for overall survival is 0.77 and the C-Index for risk of distant metastases is 0.70. The provided heat maps demonstrate areas of sarcomas deemed most salient for predictions. Our multimodal neural network with gradient blending improves predictions of overall survival and risk of distant metastases in patients with soft tissue sarcoma. Future work enabling accurate subtype-specific predictions will likely utilize similar end-to-end multimodal neural network architecture and require prospective curation of high-quality data, the inclusion of genomic data, and the involvement of multiple centers through federated learning.

  • Research Article
  • 10.1080/17480272.2025.2545896
Parametric design of non-uniform Guqin panel and thickness-coupled modes-sound quality correlation prediction via multi-modal neural network
  • Aug 19, 2025
  • Wood Material Science & Engineering
  • Jiahao Wang + 4 more

Achieving acoustically optimized structural design for the Guqin is crucial, combining its acoustic quality evaluation with craftsmanship. This paper proposes a parametric design method for non-uniform thickness distribution in the Guqin soundboard, enabling segmented thickness control and establishing 26 experimental models. Acoustic-structural coupled modal analysis models were constructed, and the acoustic quality of soundboards with varying thicknesses was evaluated. By comparing coupled modal characteristics and key acoustic parameters, the impact of panel structure on acoustic quality was quantitatively analyzed. The study reveals that adjusting the panel thickness distribution (±10 mm) alters the dominant vibration modes and their distribution, thereby influencing Guqin’s acoustic characteristics. To characterize the effect of all thickness distributions on acoustic quality and validate the correlations among thickness distribution, modal frequencies, and acoustic parameters, a multi-modal deep neural network (Mm-DNN) prediction model was developed. It integrates three subnetworks for predicting modal frequencies, acoustic parameters, and modal-acoustic parameters. The root mean square error of the model was < 0.13, and the loss value was < 0.52 × 10−2. Validation confirms Mm-DNN’s effectiveness in predicting Guqin’s acoustic performance from structural parameters. This research supports the industrialized manufacturing of Guqin and provides a digital design approach for predicting acoustic quality in traditional instruments.

  • Research Article
  • 10.1142/s0129156424400366
Design of English Lexical Analysis System for Machine Translation Based on Multimodal Neural Network
  • Jun 15, 2024
  • International Journal of High Speed Electronics and Systems
  • Yadan Deng + 1 more

Information resources have become a very important source of wealth, and due to differences in language and writing, technological information exchange between countries has become difficult. In this article, we propose a data augmentation method based on statistical models. A framework for a multimodal output model is proposed, taking into account the semantic relevance and importance between different languages. This framework is based on text sequence to sequence framework, decoupled, and a network architecture based on dual stream attention mechanism is designed. A multimodal interactive neural network layer was added between the encoder and decoder, achieving the fusion of multilingual domain information. It compensates for the influence of text on traditional translation generation systems in the process of English Chinese translation, and solves the problems of inaccurate translation results and low similarity with the original text in English Chinese machine translation. The system proposed in this article mainly consists of preprocessing module, lexical analysis and segmentation module, part of speech tagging and phrase analysis module, translation rule construction module, decoding module, translation generation module, etc. The experimental results show that using an improved generation system, compared with traditional generation systems, improves the accuracy of translation generation, has certain advantages, and is more practical.

  • Research Article
  • 10.1007/s10015-004-0306-8
A multimodal neural network with single-state predictions for protein secondary structure
  • Dec 1, 2004
  • Artificial Life and Robotics
  • Hanxi Zhu + 3 more

Prediction of protein secondary structure is considered to be an important step toward elucidating the three-dimensional structure and function of proteins. We have developed a multimodal neural network (MNN) to predict protein secondary structure. The MNN is composed of several subclassifiers for single-state predictions using neural networks and a decision neural network (DNN). Each subclassifier employs a number of subnetworks to predict the single-state of the secondary structure individually and produces the final results by majority decision. The DNN uses a three-layer neural network to produce the final overall prediction from the outputs of the single-state predictions. The MNN gives an overall accuracy of 71.1% with corresponding Matthews correlation coefficients of CH = 0.62 and CE = 0.53. The prediction test is based on a database of 126 nonhomologous protein sequences.

  • Research Article
  • Cite Count Icon 3
  • 10.1155/2022/5888299
Principal Component Research of the Teaching Model Based on Multimodal Neural Network Algorithm
  • Jun 29, 2022
  • Computational Intelligence and Neuroscience
  • Guang Yang + 3 more

With the deepening and improvement of the contemporary English educating reform, the lookup on the satisfactory English training has attracted greater and extra attention. The key to enhance the English training is to enhance good teaching, and English teaching model is the key measure to enhance good schooling and teaching. Based on a single neural network, it can solely describe the randomness and irregularity of English education quality and cannot describe the whole exchange traits of English education model, which makes the impact deviation of teaching model larger. Based on the in-depth learning of the contemporary state of affairs and traits of English education model, blended with the traits of neural network, this paper constructs an English teaching model primarily based on multimodal neural network algorithm. The experimental results show that the convergence speed of multimodal neural network model is 76% higher than that of single network model, the sum of squares of average error is 79%, and the average evaluation accuracy is 13.99% and 6.42% higher than that of convolution neural network model and radial basis function neural network model, respectively. It is demonstrated that the multimodal neural network model does not accelerate the convergence speed of the network or improve the prediction accuracy of the model and can quickly realize the ability of global optimization. It shows the effectiveness and accuracy of using multimodal neural network algorithm to model English teaching quality and provides a feasible solution for teaching quality model.

  • Research Article
  • Cite Count Icon 2
  • 10.1177/11769351251349891
Development of a Transfer Learning-Based, Multimodal Neural Network for Identifying Malignant Dermatological Lesions From Smartphone Images
  • Jan 1, 2025
  • Cancer Informatics
  • Jiawen Deng + 4 more

Objectives:Early skin cancer detection in primary care settings is crucial for prognosis, yet clinicians often lack relevant training. Machine learning (ML) methods may offer a potential solution for this dilemma. This study aimed to develop a neural network for the binary classification of skin lesions into malignant and benign categories using smartphone images and clinical data via a multimodal and transfer learning-based approach.Methods:We used the PAD-UFES-20 dataset, which included 2298 sets of lesion images. Three neural network models were developed: (1) a clinical data-based network, (2) an image-based network using a pre-trained DenseNet-121 and (3) a multimodal network combining clinical and image data. Models were tuned using Bayesian Optimisation HyperBand across 5-fold cross-validation. Model performance was evaluated using AUC-ROC, average precision, Brier score, calibration curve metrics, Matthews correlation coefficient (MCC), sensitivity and specificity. Model explainability was explored using permutation importance and Grad-CAM.Results:During cross-validation, the multimodal network achieved an AUC-ROC of 0.91 (95% confidence interval [CI] 0.88-0.93) and a Brier score of 0.15 (95% CI 0.11-0.19). During internal validation, it retained an AUC-ROC of 0.91 and a Brier score of 0.12. The multimodal network outperformed the unimodal models on threshold-independent metrics and at MCC-optimised threshold, but it had similar classification performance as the image-only model at high-sensitivity thresholds. Analysis of permutation importance showed that key clinical features influential for the clinical data-based network included bleeding, lesion elevation, patient age and recent lesion growth. Grad-CAM visualisations showed that the image-based network focused on lesioned regions during classification rather than background artefacts.Conclusions:A transfer learning-based, multimodal neural network can accurately identify malignant skin lesions from smartphone images and clinical data. External validation with larger, more diverse datasets is needed to assess the model’s generalisability and support clinical adoption.

  • Research Article
  • Cite Count Icon 9
  • 10.1109/access.2023.3335176
Multimodal Neural Network for Recognition of Cardiac Arrhythmias Based on 12-Load Electrocardiogram Signals
  • Jan 1, 2023
  • IEEE Access
  • Mariya R Kiladze + 4 more

Automatic classification of heart rhythm disturbances using an electrocardiogram is a reliable way to timely detect diseases of the cardiovascular system. The need to automate this process is to increase the number of electrocardiogram signals. Classification methods based on the use of neural networks provide a high percentage of arrhythmia recognition. However, known classification methods do not take into account patient characteristics. The work proposes a multimodal neural network that takes into account the age and gender characteristics of the patient. It includes a Long short-term memory (LSTM) network for feature extraction on twelve-channel electrocardiogram signals and a linear neural network for processing patient metadata such as age and gender. Extraction of electrocardiogram signal features occurs in parallel with metadata processing. The last unifying layer of the proposed multimodal neural network integrates heterogeneous data and features of electrocardiogram signals obtained using an LSTM network. The developed multimodal neural network was verified using the PhysioNet/Computing in Cardiology Challenge 2021 ECG database. The simulation results showed that the proposed multimodal neural network achieves a recognition accuracy of 63%, which is 2 percentage points higher compared to state-of-the-art methods.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface