Visual neural decoding aims to unlock the mysteries of how the human brain interprets the visual world through predicting perceived visual information from visual neural activity. While early studies made some progress in decoding visual activity for singular type of information, they failed to concurrently reveal the multi-level interweaving linguistic information in the brain. Here, we developed a novel Visual Language Decoding Model (VLDM) that can simultaneously decode the main categories, semantic labels, and textual descriptions of visual stimuli from visual activities. The large-scale NSD dataset was utilized to ensure the efficiency of the decoding model in joint training and evaluation across multiple tasks. For category decoding, we achieved the effective classification of 12 categories with an accuracy of nearly 70 %, significantly surpassing the chance level. For label decoding, we attained the precise prediction of 80 specific semantic labels with a 16-fold improvement over the chance level. For text decoding, the scores of the decoded text surpassed the corresponding baseline levels by remarkable margins on six evaluation metrics. These results highlight the complexity of how the brain processes visual information and the close connection between visual perception and language cognition. This study contributes significantly to extensive applications in multi-layered brain-computer interfaces, potentially leading to more natural and efficient human-computer interaction experiences.