Seeing the world from its words: All-embracing Transformers for fingerprint-based indoor localization

Son Minh Nguyen,Duc Viet Le,Paul Havinga

doi:10.1016/j.pmcj.2024.101912

Abstract

In this paper, we present all-embracing Transformers (AaTs) that are capable of deftly manipulating attention mechanism for Received Signal Strength (RSS) fingerprints in order to invigorate localizing performance. Since most machine learning models applied to the RSS modality do not possess any attention mechanism, they can merely capture superficial representations. Moreover, compared to textual and visual modalities, the RSS modality is inherently notorious for its sensitivity to environmental dynamics. Such adversities inhibit their access to subtle but distinct representations that characterize the corresponding location, ultimately resulting in significant degradation in the testing phase. In contrast, a major appeal of AaTs is the ability to focus exclusively on relevant anchors in RSS sequences, allowing full rein to the exploitation of subtle and distinct representations for specific locations. This also facilitates disregarding redundant clues formed by noisy ambient conditions, thus enhancing accuracy in localization. Apart from that, explicitly resolving the representation collapse (i.e., none-informative or homogeneous features, and gradient vanishing) can further invigorate the self-attention process in transformer blocks, by which subtle but distinct representations to specific locations are radically captured with ease. For that purpose, we first enhance our proposed model with two sub-constraints, namely covariance and variance losses at the Anchor2Vec. The proposed constraints are automatically mediated with the primary task towards a novel multi-task learning manner. In an advanced manner, we present further the ultimate in design with a few simple tweaks carefully crafted for transformer encoder blocks. This effort aims to promote representation augmentation via stabilizing the inflow of gradients to these blocks. Thus, the problems of representation collapse in regular Transformers can be tackled. To evaluate our AaTs, we compare the models with the state-of-the-art (SoTA) methods on three benchmark indoor localization datasets. The experimental results confirm our hypothesis and show that our proposed models could deliver much higher and more stable accuracy.

Full Text