Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Generative adversarial networks (GANs) generate high-dimensional vector spaces (latent spaces) that can interchangeably represent vectors as images. Advancements have extended their ability to computationally generate images indistinguishable from real images such as faces, and more importantly, to manipulate images using their inherit vector values in the latent space. This interchangeability of latent vectors has the potential to calculate not only the distance in the latent space, but also the human perceptual and cognitive distance toward images, that is, how humans perceive and recognize images. However, it is still unclear how the distance in the latent space correlates with human perception and cognition. Our studies investigated the relationship between latent vectors and human perception or cognition through psycho-visual experiments that manipulates the latent vectors of face images. In the perception study, a change perception task was used to examine whether participants could perceive visual changes in face images before and after moving an arbitrary distance in the latent space. In the cognition study, a face recognition task was utilized to examine whether participants could recognize a face as the same, even after moving an arbitrary distance in the latent space. Our experiments show that the distance between face images in the latent space correlates with human perception and cognition for visual changes in face imagery, which can be modeled with a logistic function. By utilizing our methodology, it will be possible to interchangeably convert between the distance in the latent space and the metric of human perception and cognition, potentially leading to image processing that better reflects human perception and cognition.

Similar Papers
  • Front Matter
  • Cite Count Icon 29
  • 10.1053/j.gastro.2006.06.039
Visceral Hypersensitivity: Fact or Fiction
  • Aug 1, 2006
  • Gastroenterology
  • Qasim Aziz

Visceral Hypersensitivity: Fact or Fiction

  • Research Article
  • 10.1080/15502783.2025.2534131
No effects of caffeine on cycling to exhaustion and perceptual responses in non-caffeine-restricted subjects.
  • Jul 24, 2025
  • Journal of the International Society of Sports Nutrition
  • Matthias Weippert + 8 more

Caffeine has been shown to improve endurance performance probably primary due to its pharmacological effects in the central nervous system modifying, among others, the perceptual responses during exercise. However, most studies proving the performance-enhancing effects of caffeine utilized an experimental caffeine restriction phase prior to the measurement sessions. Therefore, the effects of 2.5 and 6 mg*kg-1 oral caffeine ingestion on endurance performance, perceptual, affective, and cognitive responses during exercise, as well as time perception, were investigated in participants following their normal "ad libitum" daily diet. Two double-blinded, randomized placebo-controlled cross-over studies were performed to test the effect of 2.5 (N = 35, age: 23.3 ± 3.5 years, habitual caffeine consumption of 106 ± 89 mg*day-1) and 6.0 mg*kg-1 (N = 21, age: 21.2 ± 2.3 years, habitual caffeine consumption of 87 ± 64 mg*day-1) oral caffeine ingestion on time to exhaustion (TTE), perceived fatigue, perceptual-discriminatory (effort perception, physical strain), affective-motivational (affective valence, arousal, dominance, motivation, boredom), and cognitive-evaluative responses (decisional conflict, attentional focus) as well as time perception (time production and estimation) and heart rate during cycling at 65% peak power. Participants were low-to-moderate caffeine consumers (one participant in each study reported no habitual caffeine intake) and asked to follow their regular "ad libitum" diet without any restrictions regarding caffeinated beverages and/or food during the studies. Neither a dose of 2.5 nor of 6.0 mg*kg-1 was found to be superior to placebo with respect to TTE, perceived fatigue, the perceptual-discriminatory, affective-motivational, and cognitive-evaluative responses to exercise, as well as time perception. Both dosages of caffeine had no effect on TTE, perceived fatigue, perceptual-discriminatory, affective-motivational, and cognitive-evaluative responses to exercise, as well as on time perception and heart rate in low-to-moderate caffeine consumers without a prior experimental caffeine restriction phase. The findings suggest that caffeine´s positive effects on endurance performance and perceptual responses to exercise found in previous studies might be partly explained by the reversal of adverse effects induced by a prior caffeine restriction phase.

  • Research Article
  • Cite Count Icon 28
  • 10.1109/tsmcb.2011.2169452
Multiview Face Recognition: From TensorFace to V-TensorFace and K-TensorFace
  • Feb 3, 2012
  • IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
  • Chunna Tian + 3 more

Face images under uncontrolled environments suffer from the changes of multiple factors such as camera view, illumination, expression, etc. Tensor analysis provides a way of analyzing the influence of different factors on facial variation. However, the TensorFace model creates a difficulty in representing the nonlinearity of view subspace. In this paper, to break this limitation, we present a view-manifold-based TensorFace (V-TensorFace), in which the latent view manifold preserves the local distances in the multiview face space. Moreover, a kernelized TensorFace (K-TensorFace) for multiview face recognition is proposed to preserve the structure of the latent manifold in the image space. Both methods provide a generative model that involves a continuous view manifold for unseen view representation. Most importantly, we propose a unified framework to generalize TensorFace, V-TensorFace, and K-TensorFace. Finally, an expectation-maximization like algorithm is developed to estimate the identity and view parameters iteratively for a face image of an unknown/unseen view. The experiment on the PIE database shows the effectiveness of the manifold construction method. Extensive comparison experiments on Weizmann and Oriental Face databases for multiview face recognition demonstrate the superiority of the proposed V- and K-TensorFace methods over the view-based principal component analysis and other state-of-the-art approaches for such purpose.

  • Research Article
  • Cite Count Icon 2
  • 10.1364/josaa.33.000970
Axial nonimaging characteristics of imaging lenses: discussion.
  • Apr 26, 2016
  • Journal of the Optical Society of America A
  • Ronian Siew

At observation planes away from the image plane, an imaging lens is a nonimaging optic. We examine the variation of axial irradiance with distance in image space and highlight the following little-known observation for discussion: On a per-unit-area basis, the position of the highest concentration in image space is generally not at the focal plane. This characteristic is contrary to common experience, and it offers an additional degree of freedom for the design of detection systems. Additionally, it would also apply to lenses with negative refractive index. The position of peak concentration and its irradiance is dependent upon the location and irradiance of the image. As such, this discussion also includes a close examination of expressions for image irradiance and explains how they are related to irradiance calculations beyond the image plane. This study is restricted to rotationally symmetric refractive imaging systems with incoherent extended Lambertian sources.

  • Research Article
  • 10.32505/ikhtibar.v11i1.9137
Stimulus Hafalan Al-Qur’an Melalui Seni Tilawah Pada Anak-Anak Di TPQ Nurul Iman Desa Tamaran Kecamatan Hinai Kabupaten Langkat
  • Jul 25, 2024
  • Al-Ikhtibar: Jurnal Ilmu Pendidikan
  • Nurhanifah Nurhanifah + 2 more

This study aims to determine the process of providing stimulus and to know the improvement of memorization through the art of recitation, the author uses a type of classroom action research (CAR) using a descriptive research methodology, namely research to provide data by describing certain symptoms. The results showed that the provision of stimulus through the art of recitations in memorizing Surah Al-Falaq verses 1-5, namely there were 3 responses, perceptual responses (cognitive), emotional responses (affective), behavioristic responses (behavior). Perceptual (cognitive) responses describe students more quickly imitating ustadz (teacher) in memorizing, and their memory power will not forget, emotional (affective) responses describe students when memorizing using the art of recitations not talking much with themselves or with their friends, and behavioristic responses (behavior). behavior) shows disciplined students with their presence, never absent. As for improving the memorization of Surah Al- Falaq through the art of recitation, namely before being given action, students are lazy to memorize, many forget to memorize, then makhrajul letters and tajwid are still lacking, in the first cycle of action only 6 people achieved completeness, namely getting 54.54 results. %. Then the author took action in the second cycle, in the second cycle the students' memorization mastery increased, namely there were 9 people who got mastery with a percentage of 81.81%. This proves that there is a change or increase in memorization in children at TPQ Nurul Iman, Tamaran Village, Hinai District, Langkat Regency through the art of recitation. Keywords: Art of Recitation, Improving Memorizing of the Qur'an

  • Conference Article
  • 10.1109/smc53654.2022.9945408
Disentangled Facial Expressions Editing in Trained Latent Space
  • Oct 9, 2022
  • Win Shwe Sin Khine + 2 more

In recent years, Generative Adversarial Networks (GANs) have gained attention in image synthesis mapping from the latent space onto image space. Trained latent space carries the visual semantics for generated images. Past studies observed that arithmetic operation and linear interpolation in latent space could change the visible facial attributes, such as beards and glasses, in image space. In this work, the visual concepts in the latent space are observed, allowing to change the emotion attribute per facial expressions in the image space. We observed interpolation of a sample while disentangling the emotional attributes to edit the emotion-related facial expressions in the synthesized images. For the experiment, the Deep Convolution Generative Adversarial Networks (DCGANs) are utilized for image synthesis, and Extended Cohn Kanade (CK +) facial expression dataset is applied as the input. Our results showed that manipulating the latent space of the well-trained GANs can edit the emotional aspects of the image space. Moreover, editing facial expressions in the latent space is helpful for the recognition task to improve accuracy. Empirical results showed that the facial expressions classifier improved its performance in the recognition sadness class from 20% to 80% on the imbalance dataset.

  • Research Article
  • Cite Count Icon 32
  • 10.3389/fphys.2018.01279
Prolonged Sitting Interrupted by 6-Min of High-Intensity Exercise: Circulatory, Metabolic, Hormonal, Thermal, Cognitive, and Perceptual Responses
  • Oct 16, 2018
  • Frontiers in Physiology
  • Billy Sperlich + 4 more

The aim was to examine certain aspects of circulatory, metabolic, hormonal, thermoregulatory, cognitive, and perceptual responses while sitting following a brief session of high-intensity interval exercise. Twelve students (five men; age, 22 ± 2 years) performed two trials involving either simply sitting for 180 min (SIT) or sitting for this same period with a 6-min session of high-intensity exercise after 60 min (SIT+HIIT). At T0 (after 30 min of resting), T1 (after a 20-min breakfast), T2 (after sitting for 1 h), T3 (immediately after the HIIT), T4, T5, T6, and T7 (30, 60, 90, and 120 min after the HIIT), circulatory, metabolic, hormonal, thermoregulatory, cognitive, and perceptual responses were assessed. The blood lactate concentration (at T3–T5), heart rate (at T3–T6), oxygen uptake (at T3–T7), respiratory exchange ratio, and sensations of heat (T3–T5), sweating (T3, T4) and odor (T3), as well as perception of vigor (T3–T6), were higher and the respiratory exchange ratio (T4–T7) and mean body and skin temperatures (T3) lower in the SIT+HIIT than the SIT trial. Levels of blood glucose and salivary cortisol, cerebral oxygenation, and feelings of anxiety/depression, fatigue or hostility, as well as the variables of cognitive function assessed by the Stroop test did not differ between SIT and SIT+HIIT. In conclusion, interruption of prolonged sitting with a 6-min session of HIIT induced more pronounced circulatory and metabolic responses and improved certain aspects of perception, without affecting selected hormonal, thermoregulatory or cognitive functions.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tnnls.2024.3409573
Unsupervised Domain Adaptation for Low-Dose CT Reconstruction via Bayesian Uncertainty Alignment.
  • May 1, 2025
  • IEEE transactions on neural networks and learning systems
  • Kecheng Chen + 6 more

Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning (DL) is widely used in this problem, but the performance of testing data (also known as target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (also known as source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment (SDA) to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT (NDCT) images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance.

  • Supplementary Content
  • 10.5281/zenodo.5550474
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
  • Sep 27, 2021
  • Zenodo (CERN European Organization for Nuclear Research)
  • Christos Tzelepis + 2 more

This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., paths that are linear, and b) that their evaluation relies either on visual inspection or on laborious human labeling. More specifically, we propose to learn non-linear warpings on the latent space, each one parametrized by a set of RBF-based latent space warping functions, and where each warping gives rise to a family of non-linear paths via the gradient of the function. Building on the work of Voynov and Babenko, that discovers linear paths, we optimize the trainable parameters of the set of RBFs, so as that images that are generated by codes along different paths, are easily distinguishable by a discriminator network. This leads to easily distinguishable image transformations, such as pose and facial expressions in facial images. We show that linear paths can be derived as a special case of our method, and show experimentally that non-linear paths in the latent space lead to steeper, more disentangled and interpretable changes in the image space than in state-of-the art methods, both qualitatively and quantitatively. We make the code and the pretrained models publicly available at: this https URL.

  • Conference Article
  • Cite Count Icon 60
  • 10.1109/ivs.2013.6629548
Automated extrinsic laser and camera inter-calibration using triangular targets
  • Jun 1, 2013
  • Stefano Debattisti + 2 more

This paper presents a method for solving the extrinsic calibration between camera and multi-layer laser scanner for outdoor multi-sensorized vehicles. The proposed method is designed for intelligent vehicles within the autonomous navigation task where usually distances between sensor and targets become relevant for safety reasons, therefore high accuracy across different measures must be kept. The calibration procedure takes advantage of triangular shapes still present in scenarios, it recovers three virtual points as target pose in the laser and camera reference frames and then compute extrinsic information of each camera sensor with respect to a laser scanner by minimizing a geometric distance in the image space. To test algorithm correctness, and accuracy a set of simulations are used reporting absolute error results and solution convergence, then tests on robustness and reliability (i.e., outliers management) are based on a wide set of datasets acquired by VIAC prototypes.

  • Research Article
  • Cite Count Icon 3
  • 10.1002/asi.20357
A cluster‐based approach for efficient content‐based image retrieval using a similarity‐preserving space transformation method
  • Aug 22, 2006
  • Journal of the American Society for Information Science and Technology
  • Biren Shah + 3 more

The techniques of clustering and space transformation have been successfully used in the past to solve a number of pattern recognition problems. In this article, the authors propose a new approach to content‐based image retrieval (CBIR) that uses (a) a newly proposed similarity‐preserving space transformation method to transform the original low‐level image space into a high‐level vector space that enables efficient query processing, and (b) a clustering scheme that further improves the efficiency of our retrieval system. This combination is unique and the resulting system provides synergistic advantages of using both clustering and space transformation. The proposed space transformation method is shown to preserve the order of the distances in the transformed feature space. This strategy makes this approach to retrieval generic as it can be applied to object types, other than images, and feature spaces more general than metric spaces. The CBIR approach uses the inexpensive “estimated” distance in the transformed space, as opposed to the computationally inefficient “real” distance in the original space, to retrieve the desired results for a given query image. The authors also provide a theoretical analysis of the complexity of their CBIR approach when used for color‐based retrieval, which shows that it is computationally more efficient than other comparable approaches. An extensive set of experiments to test the efficiency and effectiveness of the proposed approach has been performed. The results show that the approach offers superior response time (improvement of 1–2 orders of magnitude compared to retrieval approaches that either use pruning techniques like indexing, clustering, etc., or space transformation, but not both) with sufficiently high retrieval accuracy.

  • Conference Article
  • Cite Count Icon 20
  • 10.1145/1290082.1290088
Regularized regression on image manifold for retrieval
  • Sep 24, 2007
  • Deng Cai + 2 more

Recently, there have been considerable interests in geometric-based methods for image retrieval. These methods consider the image space as a smooth manifold and apply manifold learning techniques to find a Euclidean embedding. Thus, the Euclidean distances in the embedding space can be used as approximations to the geodesic distances on the manifold. A main advantage of these methods is that the relevance feedbacks during retrieval can be naturally incorporated into the system as prior information. In this paper, we consider the retrieval problem as a classification problem on manifold. Instead of learning a distance measure, we aim to learn a classification function on the image manifold. Considering efficiency is a key issue in image retrieval, especially on the Webscale, we propose a novel approach for image retrieval on manifold. This approach is based on a regularized linear regression framework. The local manifold structure and user-provided relevance feedbacks are incorporated into the image retrieval system through a Locality Preserving Regularizer. Extensive experiments are carried out on a large image database which demonstrates the efficiency and effectiveness of the proposed approach.

  • Conference Article
  • Cite Count Icon 179
  • 10.1109/cvpr.1999.784727
Object recognition with color cooccurrence histograms
  • Jun 1, 1999
  • Peng Chang + 1 more

We use the color cooccurrence histogram (CH) for recognizing objects in images. The color CH keeps track of the number of pairs of certain colored pixels that occur at certain separation distances in image space. The color CH adds geometric information to the normal color histogram, which abstracts away all geometry. We compute model CHs based on images of known objects taken from different points of view. These model CHs are then matched to subregions in test images to find the object. By adjusting the number of colors and the number of distances used in the CH, we can adjust the tolerance of the algorithm to changes in lighting, viewpoint, and the flexibility of the object We develop a mathematical model of the algorithm's false alarm probability and use this as a principled way of picking most of the algorithm's adjustable parameters. We demonstrate our algorithm on different objects, showing that it recognizes objects in spite of confusing background clutter partial occlusions, and flexing of the object.

  • Book Chapter
  • 10.1002/9781394171538.ch8
Optics and Image Formation
  • Mar 21, 2023

There are two types of mathematics to describe lens systems: geometry and Fourier analysis. Geometric optics and Fourier optics illuminate final image formation, whereby the final image of the object is the object convolved by the point spread function. The light-collecting capacity of the lens, or its aperture, determines the resolving power of a lens, with narrow apertures giving less resolution. Clear focus of a 3D slice of object space can occur over a range of distances in object space that depends on the aperture, or numerical aperture (NA), of the lens. The depth of the 3D slice of object space that remains in focus in image space is the depth of field, or in microscopy, the optical section thickness. In lenses in which aberrations limit the usable NA, such as those used in photography and electron microscopy, the optical geometry, which includes magnification, determines the axial resolution.

  • Research Article
  • Cite Count Icon 31
  • 10.1109/lsp.2017.2777881
Robust Image Fingerprinting via Distortion-Resistant Sparse Coding
  • Jan 1, 2018
  • IEEE Signal Processing Letters
  • Yuenan Li + 1 more

Content fingerprinting recently emerges as an effective nonintrusive solution for copyright protection. Fingerprinting algorithm maps the perceptual contents of media file to an invariant digest, so that unauthorized copies can be identified via fingerprint comparison. This letter presents a distortion-resistant sparse coding strategy for image fingerprinting that simulates the hierarchical information processing flow of visual system. Sparse coding, which seeks a small set of atoms that can best represent input signal, helps fingerprinting algorithm detect the intrinsic visual features of image. However, the high freedom of atom selection makes sparse coding sensitive to distortion. In this letter, several measures are applied on sparse coding and dictionary learning to jointly ensure the invariance of fingerprint, such as imposing the neighborhood-priority principle on atom selection, regulating the layout of atoms, and forcing sparse codes to preserve the distance in the image space. Content identification performance of the proposed work was tested on a database of 219 000 images. The error rate of the proposed algorithm is at least ten times lower than state-of-the-arts, and satisfactory performance was observed even under extremely low bit budget.

Save Icon
Up Arrow
Open/Close