This issue contains six papers. In the first paper, Christos Kyrlitsias and Despina Michael-Grigoriou, from the Cyprus University of Technology in Limassol, Cyprus, investigate the conformity to virtual humans in an immersive virtual environment using two experiments. In the first experiment, they study whether agents have social influence on the participants by conducting the Asch conformity experiment. In the second experiment, they use a similar method to study how the factors “agency” and “behavioral realism” affect social conformity. The results of the experiment show that conformity can be caused by virtual humans in immersive virtual environments. In the second paper, Yu Zhu, Shiying Li, Xi Luo, Kang Zhu, Qiang Fu, Huixing Gong, Xilin Chen, and Jingyi Yu, from the Shanghai Institute of Microsystem and Information Technology, ShanghaiTech University, in Shanghai, and the Chinese Academy of Sciences, in Beijing, China, propose SAVE (shared augmented virtual environment), a mixed-reality system that overlays the virtual world with real objects captured by a Kinect depth camera. They connect the virtual and real worlds with a bridge controller mounted on the Kinect and only need to calibrate the whole system once before use. Subsequently, they refine the depth map and exploit a GPU-based natural image matting method to obtain the real objects from cluttered scenes. In the synthetic mixed-reality world, they can render real and virtual objects in real time and handle the depth from both worlds properly. In the third paper, Umut Agil and Ugur Gudukbay, from Bilkent University, Ankara, Turkey, propose a saliency model that enables virtual agents to produce plausible gaze behavior. The model measures the effects of distinct saliency features implemented by examining the state-of-the-art perception studies. When predicting an agent's interest point, they compute the saliency scores by using a weighted sum function for other agents and environment objects in the field of view of the agent for each frame. Then, they determine the most salient entity for each agent in the scene; thus, agents gain a visual understanding of their environment. Besides, their model introduces new aspects to crowd perception, such as perceiving characters as groups of people, applying social norms on crowd gaze behavior, effects of agent personality on gaze, gaze copy phenomena, and effects of agent velocity on attention. In the fourth paper, Yao Lu, Shang Zhao, Naji Younes, and James K. Hahn, from George Washington University, Washington, United States, present a cost-effective and easy-to-use 3D body reconstruction system using consumer-grade depth sensors, which provides reconstructed body shapes with a high degree of accuracy and reliability appropriate for medical applications. Their surface registration framework integrates the articulated motion assumption, global loop closure constraint, and a general as-rigid-as-possible deformation model. To enhance the reconstruction quality, they propose a novel approach to accurately infer skeletal joints from anatomic data using multimodality registration. They further propose a supervised predictive model to infer the skeletal joints for arbitrary subjects independent from anatomic data reference. A rigorous validation test has been conducted on real subjects to evaluate reconstruction accuracy and repeatability. In the fifth paper, Wanrong Huang, Xiaodong Yi, and Xue-Jun Yang, from the National University of Defense Technology in Changsha and the National Innovation Institute of Defense Technology (NIIDT) in Beijing, China, have designed a three-layer framework for multirobot coordination. Furthermore, a novel-distributed algorithm is proposed to achieve the navigation objective while satisfying connectivity maintenance and collision avoidance constraints. The algorithm is a hybrid of an RRT-based planner and an extended DNF-based controller. The coordination framework and the distributed algorithm are demonstrated to be effective through a series of illustrative simulations. They outperform the current state-of-the-art method in terms of efficiency and applicability. In the last paper, Kai Wang and Shiguang Liu, from Tianjin University, in China, present an automatic approach for the semantic modeling of indoor scenes based on a single photograph, instead of relying on depth sensors. Without using handcrafted features, they guide indoor scene modeling with feature maps extracted by fully convolutional networks. Three parallel fully convolutional networks are adopted to generate object instance masks, a depth map, and an edge map of the room layout. Based on these high-level features, support relationships between indoor objects can be efficiently inferred in a data-driven manner. Constrained by the support context, a global-to-local model matching strategy is followed to retrieve the whole indoor scene. They demonstrate that the proposed method can efficiently retrieve indoor objects including situations where the objects are badly occluded. This approach enables efficient semantic-based scene editing. Note that the two last papers are revised versions of papers already published in the CASA 2018 Special Issue 29.3-4.