FASSD-Net Model for Person Semantic Segmentation

Luis Brandon Garcia-Ortiz,Jesus Olivares-Mercado,Gabriel Sanchez-Perez,Aldo Hernandez-Suarez,Hector Perez-Meana,Jose Portillo-Portillo,Gibran Benitez-Garcia,Karina Toscano-Medina

doi:10.3390/electronics10121393

Luis Brandon Garcia-Ortiz, Jesus Olivares-Mercado + Show 6 more

Open Access

https://doi.org/10.3390/electronics10121393

Copy DOI

Abstract

This paper proposes the use of the FASSD-Net model for semantic segmentation of human silhouettes, these silhouettes can later be used in various applications that require specific characteristics of human interaction observed in video sequences for the understanding of human activities or for human identification. These applications are classified as high-level task semantic understanding. Since semantic segmentation is presented as one solution for human silhouette extraction, it is concluded that convolutional neural networks (CNN) have a clear advantage over traditional methods for computer vision, based on their ability to learn the representations of appropriate characteristics for the task of segmentation. In this work, the FASSD-Net model is used as a novel proposal that promises real-time segmentation in high-resolution images exceeding 20 FPS. To evaluate the proposed scheme, we use the Cityscapes database, which consists of sundry scenarios that represent human interaction with its environment (these scenarios show the semantic segmentation of people, difficult to solve, that favors the evaluation of our proposal), To adapt the FASSD-Net model to human silhouette semantic segmentation, the indexes of the 19 classes traditionally proposed for Cityscapes were modified, leaving only two labels: One for the class of interest labeled as person and one for the background. The Cityscapes database includes the category “human” composed for “rider” and “person” classes, in which the rider class contains incomplete human silhouettes due to self-occlusions for the activity or transport used. For this reason, we only train the model using the person class rather than human category. The implementation of the FASSD-Net model with only two classes shows promising results in both a qualitative and quantitative manner for the segmentation of human silhouettes.

Highlights

There are many high-level computer vision tasks which relay in human detection in video sequences, such as intelligent video surveillance
Many high-level tasks for understanding human interaction in video sequences, are based on accurate semantic segmentation of human silhouettes; this requires that the implementation can be executed on high-resolution images and in real time; this paper proposes the use of the novel neural network entitled FASSD-Net model [8] adapted for the semantic segmentation of two classes of interest— “person” and “background”—encouraging the use of human silhouettes in future applications; for example, the human identification [9] by gait analysis with a holistic approach or translating Mexican Sign Language into text
In order to fit the FASSD-Net model, the images in the Cityscapes training dataset with their respective labels are pre-processed by changing the indexes of the other 18 classes, leaving only two labels: one for background and another for the class of interest labeled as person

Summary

Introduction

There are many high-level computer vision tasks which relay in human detection in video sequences, such as intelligent video surveillance. The applications of IVVS is becoming more specific, e.g., environmental home monitoring related to human activities, such as remote monitoring and automatic fall detection for elderly people at home [3] Another main application for IVVS is video storage and retrieval, where the surveillance system may be prone to record the video if human beings are in the scene, saving time, data storage and, resources. Nowadays, another major application for highlevel computer vision and IVSS is Human Computer Interface (HCI), of which identity recognition and human identification is based on gait analysis. The quantitative results seem similar, in the rest of this section, a qualitative evaluation will be performed to determinethe performance of the proposed method, compared to some existing ones

Methods

Results

Conclusion