With the rise of augmented reality (AR) and context-aware ubiquitous learning (CAUL), pedagogical designers in computer assisted language learning are increasingly developing authentic English for Specific Purposes (ESP) learning environments. However, there has been little research regarding the development of evidence-based principles for English for tourism purposes (ETP) through AR-based CAUL. The purpose of this study was to provide a preliminary analysis of the design principles and develop a learning model based on a rigorous four-phase Design-Based Research procedure. The researchers first developed a location-based AR application for ETP and formulated the first set of design principles. This application was further tested and refined in an iterative process, with data collected from three different sources. The data collection methods included an online questionnaire, onsite observations, and semi-structured expert interviews. All the data was triangulated and analyzed, and a team of experts evaluated the design principles. Results showed that the technical affordances of mobile AR devices should form the basis for the design. The application needs to seamlessly provide multimodal scenery-based AR learning supports to fulfill individual needs in blended learning contexts. Furthermore, a user-friendly interface with personalized functions and a portfolio recording the learning progress needs to be included. Multimodal authentic ETP dialogs and terminology are also necessary in order to build students’ ETP speaking competence. Moreover, the learning material should be compatible with the main theories which drive language learning in CAUL. In light of the findings, a learning model was established to guide AR-based ETP learning and serve as template for future studies in other ESP fields.