There have been many methods to improve the capacity of scene classification of service robots. However, most of them are proposed from a technical standpoint but without reference to any cognitive principle of the brain, and furthermore, from design to evaluation, the particularity of the robot task is still not fully considered, such as cross-environment generalization, explicit semantic preservation and interpretation. Thus, the scene cognitive behavior of robots is far from humans, and their environmental adaptability is still poor. It is difficult to complete learning place concepts from discrete fragments and then continuously perceiving them with a limited view in unvisited spaces. Inspired by the recent findings from neuroscience, an attention-based global and object attribute fusion mechanism (AGOFM for short) constructed by three parts is proposed to overcome these deficiencies. In the global attribute part, a global feature extractor and a sequence context extractor are used to generate the holistic feature. The involved context integrates limited views to form an overall impression of a scene for guiding attention. In the object attribute part, a novel object vector is proposed. It simultaneously involves the detected object quantity, category and confidence information, which are all related to the vector index and high-level semantics. In the attention generation part, two sorted top-X characteristics deriving from the above two parts are fed into a fully connected (FC) network with batch normalization to generate effective attention. The attention weights are then applied to the batch normalized global and object vectors respectively, and subsequently, the two heterogeneous information are directly fused by another FC network to achieve scene classification. The policies for multi-learner fusion and frame rejection are also provided. Finally, a novel evaluation paradigm is proposed that the model is trained on a discrete prior dataset, and then the inference is tested on a traditional dataset and two robot view datasets. This simulates the cross-environment situation. Under such severe conditions, the results demonstrate that the proposed method outperforms several popular methods.