“Life Must Move Forward”
This study explores the experiences of students using a multimodal dialogue system to engage with sensitive historical content, specifically focusing on the challenges they face and how these challenges are addressed. Data were collected from 35 participants across six public and private schools in Semarang utilising qualitative methods, including interviews and focus group discussions. The findings reveal that students encounter various challenges, such as emotional discomfort and difficulty in articulating their thoughts on traumatic historical events. However, the multimodal dialogue system facilitates a supportive environment that encourages open discussion and reflection, allowing students to navigate these challenges effectively. The study highlights the importance of creating safe spaces for dialogue, where students can explore their identities and values in relation to traumatic history. Additionally, the research underscores the potential of technology to enhance student engagement and foster deeper understanding of sensitive topics. The implications for educators and curriculum developers are significant, suggesting that integrating multimodal dialogue systems can enrich history education and promote critical thinking. Limitations of the study include a small sample size and a focus on a specific geographic area, indicating the need for further research to generalise findings across diverse educational contexts.
- 10.25159/1947-9417/13712
- Jun 12, 2023
- Education as Change
11
- 10.1177/0967828x16649310
- Jun 1, 2016
- South East Asia Research
- 10.24198/jkk.v12i1.49806
- Jun 30, 2024
- Jurnal Kajian Komunikasi
4
- 10.1177/14782103231177615
- May 19, 2023
- Policy Futures in Education
4
- 10.5038/1911-9933.15.1.1776
- May 1, 2021
- Genocide Studies and Prevention
3
- 10.1007/978-3-319-23766-4_57
- Jan 1, 2015
4
- 10.14710/jscl.v4i1.21576
- Mar 18, 2019
- Jurnal Sejarah Citra Lekha
1
- 10.4324/9781003205838
- Aug 3, 2023
385
- 10.1016/j.tsc.2013.12.004
- Jan 18, 2014
- Thinking Skills and Creativity
7
- 10.4324/9781315392424
- Oct 6, 2017
- Research Article
13
- 10.1145/3606368
- Nov 7, 2023
- ACM Transactions on Information Systems
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1) overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
- Conference Article
4
- 10.21437/icslp.2002-115
- Sep 16, 2002
We present a high level formalism for specifying verbal and nonverbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output without detailing the realisation of these functions. The specification can be used to control an animated character that uses speech and gestures. We give examples from an implementation in a multimodal spoken dialogue system, and describe how facial gestures are implemented in a 3Danimated talking agent within this system.
- Book Chapter
8
- 10.1007/1-4020-3075-4_6
- Jan 1, 2005
We present a formalism for specifying verbal and non-verbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output, without detailing the realisation of these functions. The aim is to let dialogue systems generate the same output for a wide variety of output devices and modalities. The formalism was developed and implemented in the multimodal spoken dialogue system AdApt. We also describe how facial gestures in the 3D-animated talking head used within this system are controlled through the formalism.
- Conference Article
6
- 10.1109/mmsp.2007.4412824
- Jan 1, 2007
This paper presents an approach for an extensible multimodal interaction dialogue system, R-Flow, based on a recursive application of Model-View-C'ontroller (MVC) design patterns to derive system components and interfaces. This approach leads to a clear separation of three self-contained functional layers in a multimodal dialogue system: modality independent dialog control, synchronization of logical modalities, and physical presentation. These layers are codified and weaved together through standard based XML languages. In particular, it utilizes the standard State-Chart XML (SCXML) for dialog control, SMIL and EMMA based XM-Flow for modality synchronization and interpretation, and a generic XML based binding mechanism to map logical modalities to physical presentations. A prototype system has been implemented for multimodal (e.g. speech, text, and mouse) manipulation of Google map. Our experimental results indicated that such layered and component-based XML MMI system is feasible and the performance of such MMI system is studied and measured.
- Conference Article
19
- 10.21437/eurospeech.2001-528
- Sep 3, 2001
This paper describes application oriented research on architectural building blocks and constraints for adaptive multimodal dialog systems that use VoiceXML as a component technology. The VoiceXML standard is well supported and promises to make the development of speech-enabled applications so easy that everyone with general web programming skills can accomplish it. The paper proposes a software architecture for multimodal interfaces that emphasizes modularity, in order to strengthen this potential by clearly separating different types of development tasks in a multimodal dialog system. The paper argues that adaptivity is a central design concern for multimodal dialog systems, but that adaptivity is not facilitated by the current VoiceXML standard. This and other examined limitations of VoiceXML for building multimodal dialog systems should be repaired in upcoming work on a successor standard that will explicitly target multimodal applications.
- Research Article
- 10.14710/jppmr.v6i2.15819
- Mar 3, 2017
- Journal of Public Policy and Management Review
Student Role in Larva Monitoring (Siswa Pemantau Jentik) is Dangue Hemorrhagic Fever (DHF) control efforts to empower elementary school students to observe the presence of mosquito larvae in the school environment Implementation sismantik program at a private elementary school in Semarang is one of the measures to reduce the death rate of DHF patients in school age children in the city of Semarang. DHF control efforts through monitoring mosquito larvae have been arranged in Semarang City Regional Regulation No. 5 of 2010. The implementation of the monitoring program of mosquito larvae on private elementary school in Semarang in fact still not optimally implemented on all private schools in Semarang it was proven that their school has not been up to implement the program sismantik regularly in private elementary school. Sismantik program conducted in private elementary school in Semarang are still not optimally implemented because there is still a private elementary school that has not routinely perform these government programs.. The purpose of study is to describe the implementation of Student Role in Larva Monitoring (siswa pemantau Jentik) program in private school is controlling dangue disease in Semarang and to identify in habiting factor of Student role in Larva Monitoring program in private school in Semarang. This research is a descriptive study with qualitative approach. The result showed that there are things that inhibit the implementation of the program. This is caused by the weak human resources, communication, characteristic office executive (bereauratic structure). Recommendation that can be given is improving quantity and quality of human resources, improving communications between the policy implementer and making a clear organization structure that can be understood easily.
- Research Article
2
- 10.3390/electronics11203409
- Oct 20, 2022
- Electronics
The recent advancements in multimodal dialogue systems have been gaining importance in several domains such as retail, travel, fashion, among others. Several existing works have improved the understanding and generation of multimodal dialogues. However, there still exists considerable space to improve the quality of output textual responses due to insufficient information infusion between the visual and textual semantics. Moreover, the existing dialogue systems often generate defective knowledge-aware responses for tasks such as providing product attributes and celebrity endorsements. To address the aforementioned issues, we present a Transformer-based Multimodal Infusion Dialogue (TMID) system that extracts the visual and textual information from dialogues via a transformer-based multimodal context encoder and employs a cross-attention mechanism to achieve information infusion between images and texts for each utterance. Furthermore, TMID uses adaptive decoders to generate appropriate multimodal responses based on the user intentions it has determined using a state classifier and enriches the output responses by incorporating domain knowledge into the decoders. The results of extensive experiments on a multimodal dialogue dataset demonstrate that TMID has achieved a state-of-the-art performance by improving the BLUE-4 score by 13.03, NIST by 2.77, image selection Recall@1 by 1.84%.
- Research Article
2
- 10.24200/jsshr.vol8iss1pp%p
- Oct 4, 2020
- The Journal of social sciences and humanities
Objective: This study aimed at comparing the descriptive assessment in terms of critical and creative thinking of sixth grade students in the public and private schools of district four in Karaj. This is a descriptive research which is done in a causal-comparative method. Methodology: The population consisted of all sixth grade students (girls and boys) in the public and private primary schools. According to the information, the total number of students was 8529 people, which 7788 of them were selected among public school students and 741 of them were selected among private school students. The sample size of 368 was determined by using Morgan table, in which 330 of the samples were belonged to the public schools and 38 of the samples were students of the private schools. Stratified random sampling method was used as the sampling method. This research was employed Critical Thinking Dispositions Questionnaire with the reliability of sub-scales of creativity= 0.75 and commitment=0.86; and Torrance Test of Creative Thinking with reliability between 0.80 to 0.90; and descriptive Transcript of Records. SPSS software is used to show the results in both descriptive and inferential statistics (T-test). Results: Results showed that there is a difference between descriptive assessment in terms of creative thinking among sixth grade students in the public and private school district four in Karaj; but, the difference was not observed the critical thinking. Among all the variables of creative thinking and its dimensions (invention, extension, fluidity, flexibility, and creativity), descriptive assessment of the majority of students in the public and private schools was greatly good; however, it was better in the public schools than the private. Conclusion: Among all variables of creativity, commitment and critical thinking, descriptive assessment of the majority of students in the public and private schools was acceptable.
- Conference Article
3
- 10.1109/roman.2007.4415120
- Jan 1, 2007
Current Speech-based dialog system undergo a practical problem; a speech recognizer is defective due to inevitable errors. Even in multimodal dialog systems, which have multiple input channels, errors in the speech recognition are a major problem because speech contains a large portion of user's intention. In this paper, we propose a re-ranking method to improve the performance of speech recognition in a multimodal dialog system. To re-rank the n-best speech recognition hypotheses, we use the multimodal understanding features that are orthogonal to the speech as well as the speech recognizer features. We demonstrate our method to smart home domain, and the results show that the multimodal understanding features are promising in overcoming many speech errors.
- Research Article
15
- 10.1109/tmm.2006.887999
- Apr 1, 2007
- IEEE Transactions on Multimedia
For pt.1see ibid., vol. 9, p. 3 (2007). In this paper, the task and user interface modules of a multimodal dialogue system development platform are presented. The main goal of this work is to provide a simple, application-independent solution to the problem of multimodal dialogue design for information seeking applications. The proposed system architecture clearly separates the task and interface components of the system. A task manager is designed and implemented that consists of two main submodules: the electronic form module that handles the list of attributes that have to be instantiated by the user, and the agenda module that contains the sequence of user and system tasks. Both the electronic forms and the agenda can be dynamically updated by the user. Next a spoken dialogue module is designed that implements the speech interface for the task manager. The dialogue manager can handle complex error correction and clarification user input, building on the semantics and pragmatic modules presented in Part I of this paper. The spoken dialogue system is evaluated for a travel reservation task of the DARPA Communicator research program and shown to yield over 90% task completion and good performance for both objective and subjective evaluation metrics. Finally, a multimodal dialogue system which combines graphical and speech interfaces, is designed, implemented and evaluated. Minor modifications to the unimodal semantic and pragmatic modules were required to build the multimodal system. It is shown that the multimodal system significantly outperforms the unimodal speech-only system both in terms of efficiency (task success and time to completion) and user satisfaction for a travel reservation task
- Conference Article
4
- 10.1109/slt.2006.326838
- Jan 1, 2006
In this paper the efficiency and usage patterns of input modes in multimodal dialogue systems is investigated for desktop and personal digital assistant (PDA) working environments. For this purpose a form-filling travel reservation system is designed and implemented that efficiently combines the speech and visual modalities; three multimodal modes of interaction are implemented, namely: click-to-talk, open-mike and modality-selection. The three multimodal systems are evaluated and compared with the GUI-only and speech-only unimodal systems. User interface evaluation includes both objective and subjective metrics and shows that all three multimodal systems outperform the unimodal systems on the PDA environment. For the desktop environment the multimodal systems score better than the speech-only system but worse than the GUI-only system. In all evaluation experiments, the synergy between the visual and speech modality was significant: the multimodal interface was better than the sum of its (unimodal) parts. Results also show that users tend to use the most efficient input mode.
- Research Article
- 10.4324/9781315751061-17
- Dec 8, 2016
Repairing an Immigrant Chinese Family’s “Box of Terrible Things”
- Conference Article
3
- 10.1145/2818346.2823309
- Nov 9, 2015
Despite their ability to complete certain tasks, dialog systems still suffer from poor adaptation to users' engagement and attention. We observe human behaviors in different conversational settings to understand human communication dynamics and then transfer the knowledge to multimodal dialog system design. To focus solely on maintaining engaging conversations, we design and implement a non-task oriented multimodal dialog system, which serves as a framework for controlled multimodal conversation analysis. We design computational methods to model user engagement and attention in real time by leveraging automatically harvested multimodal human behaviors, such as smiles and speech volume. We aim to design and implement a multimodal dialog system to coordinate with users' engagement and attention on the fly via techniques such as adaptive conversational strategies and incremental speech production.
- Conference Article
27
- 10.1145/3308558.3313598
- May 13, 2019
Multimodal dialogue systems are attracting increasing attention with a more natural and informative way for human-computer interaction. As one of its core components, the belief tracker estimates the user's goal at each step of the dialogue and provides a direct way to validate the ability of dialogue understanding. However, existing studies on belief trackers are largely limited to textual modality, which cannot be easily extended to capture the rich semantics in multimodal systems such as those with product images. For example, in fashion domain, the visual appearance of clothes play a crucial role in understanding the user's intention. In this case, the existing belief trackers may fail to generate accurate belief states for a multimodal dialogue system.
- Conference Article
4
- 10.1145/1322192.1322212
- Nov 12, 2007
In this paper, the efficiency and usage patterns of input modes in multimodal dialogue systems is investigated for both desktop and personal digital assistant (PDA) working environments. For this purpose a form-filling travel reservation application is evaluated that combines the speech and visual modalities; three multimodal modes of interaction are implemented, namely: Click-To-Talk, Open-Mike and Modality-Selection. The three multimodal systems are evaluated and compared with the GUI-Only and Speech-Only unimodal systems. Mode and duration statistics are computed for each system, for each turn and for each attribute in the form. Turn time is decomposed in interaction and inactivity time and the statistics for each input modeare computed. Results show that multimodal and adaptive interfaces are superior in terms of interaction time, but not always in terms of inactivity time. Also users tend to use themost efficient input mode, although our experiments show abias towards the speech modality.
- New
- Research Article
- 10.25159/1947-9417/19927
- Oct 29, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/19407
- Oct 22, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/19341
- Oct 21, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/18365
- Oct 15, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/20153
- Sep 15, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/20446
- Sep 10, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/18475
- Sep 10, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/18995
- Aug 11, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/18342
- Aug 11, 2025
- Education as Change
- Research Article
- 10.25159/1947-9417/17636
- Jul 22, 2025
- Education as Change
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.