SUMMARY In this paper, we discuss QoE (Quality of Experience) requirements for MVV (Multi-View Video) and audio transmission over IP networks and study the effect of the playout buffering time, contents and viewpoint change interfaces on the QoE and user’s behavior. Unlike previous works, which mainly discuss MVV transmission from aspects of video codecs, we study MVV and audio transmission under various IP traffi ca nd delay conditions by experiment. We compare two schemes: a scheme that the user watches from a single viewpoint and the one that he/she can choose one viewpoint from many ones. As a result, we show that the users prefer the scheme where they can choose one viewpoint from many ones. We have found that when using proper buffering time, the users feel faster viewpoint changes; it improves their satisfaction compared to that when they watch on a single viewpoint. We have also noticed that the user pays more attention to the degradation of the video when watching on a single viewpoint. We have observed that the users tend to change the viewpoint more frequently in light traffic and low delay.