Enhanced U-Net architectures for accurate room impulse response generation via differential-phase learning
Enhanced U-Net architectures for accurate room impulse response generation via differential-phase learning
- Research Article
- 10.1121/1.4969518
- Oct 1, 2016
- Journal of the Acoustical Society of America
We describe the influence of the separation accuracy via semi-blind source separation (semi-BSS) upon the measurement accuracy of the room impulse responses. Tatekura et al. had proposed a simultaneous measurement method of the room impulse responses with ensemble music. In this method, the room impulse responses were measured by separating ensemble music into each instrument sound reproduced from each loudspeaker. However, the influence of the separation accuracy via semi-BSS upon the measurement accuracy of the room impulse responses has not been clarified. Therefore, we evaluated the measurement accuracy of the room impulse responses and the separation accuracy by using the different instrument sounds. The room impulse responses were measured between two loudspeakers and a control point with 11 kinds of musical instrument sounds. From the result of the measurement accuracy, the difference between the maximum value and the minimum value was 10 dB. However, from the result of the separation accuracy, most of value held equivalent measurement accuracy. Therefore, under these measurement conditions, this method is expected to be able to measure the room impulse responses by optional instrument sounds.
- Conference Article
2
- 10.1109/waspaa.2017.8170024
- Oct 1, 2017
The room impulse responses at multiple receiver positions can be measured efficiently with a continuously moving microphone. The acoustic system is periodically excited by a self-orthogonal signal, called perfect sequence, and the microphone captures the sound field on a pre-defined path. As shown in recent studies by the authors, the captured signal constitutes a spatio-temporal sampling of the sound field, and the impulse responses can be obtained by a spatial interpolation. So far, a uniformly moving microphone was mainly considered for the measurement of spatial room impulse responses. In this paper, the method is applied to non-uniformly moving microphones thereby addressing more general cases. The proposed method is evaluated by numerical simulations where the spatial room impulse responses on a circle are measured using a microphone with a fluctuating angular speed. The accuracy of the impulse responses are compared for varying interpolation orders.
- Research Article
- 10.1121/1.406547
- Apr 1, 1993
- The Journal of the Acoustical Society of America
For the last 5 years, the authors have been making acoustical surveys of auditoria in the world at every opportunity. In these acoustical measurements, room impulse responses have been measured using a dodecahedral omnidirectional loudspeaker, monaural microphones, and a dummy head system (Neumann KU81i). In order to get accurate impulse responses with high S/N ratio, the sweep-pulse method and synchronous averaging technique were adopted. From the monaural and binaural impulse responses measured in a lot of concert halls and theaters, such acoustic indices as T60 (reverberation time), EDT (early decay time), D50 (definition), C80 (clarity), Ts (center time), and IACC (inter-aural cross correlation) were obtained, and the relationships among them were statistically investigated. As a result, it has been found that such indices as D50, C80, and Ts are highly correlated with each other and the correlations between IACC and other monaural indices are very low. The binaural impulse responses measured through the dummy head were convolved with dry music and speech signals by digital technique, and they are being used as the test signals for subjective experiments using a transaural reproduction system.
- Research Article
- 10.46572/naturengs.1587879
- Dec 24, 2024
- NATURENGS MTU Journal of Engineering and Natural Sciences Malatya Turgut Ozal University
AI-powered chat applications are innovative solutions that facilitate user interaction and information access. These applications improve user experience by providing personalized and context-sensitive responses thanks to large language models and natural language processing techniques. This study examined the design and development process of UniRobo, an AI-powered chat application developed for Malatya Turgut Ozal University students and staff. UniRobo is an application that provides instant information on topics such as education, food menus, and campus events and offers personalized responses using large language models and natural language processing techniques. In the development process, based on user needs analysis, the mobile application was created with React Native, the back-end was created with Python and FastAPI, and the MongoDB database was integrated. Artificial intelligence capabilities were supported by OpenAI API and fine-tuning, thus adapting to university-specific content. Retrieval-Augmented Generation (RAG) architecture and Azure AI Search technology increased user satisfaction by providing more accurate and faster responses. As a result, UniRobo has made university life more accessible, providing users with access to fast and accurate information, and has demonstrated the potential of artificial intelligence-based solutions in the education sector.
- Research Article
- 10.55041/ijsrem47399
- May 10, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract- This paper introduces an AI-powered travel guide chatbot that utilizes a Retrieval- Augmented Generation (RAG) architecture to provide users with intelligent, context-sensitive assistance based on their own uploaded documents. The system is designed to handle various file formats containing travel-related content such as itineraries, brochures, and destination notes. By allowing users to interact with the chatbot through natural language queries, the platform delivers accurate and relevant responses grounded in the content of these documents. The core language model used in the system is Mistral AI, which delivers high-quality, efficient responses and supports multi-turn conversations. This approach ensures scalability and responsiveness while maintaining a high level of accuracy in information retrieval. The chatbot's versatility makes it suitable for a wide range of use cases, including travel planning, destination research, and itinerary refinement. The proposed system demonstrates the effectiveness of combining RAG architecture with privacy- focused design in the development of next-generation AI chat applications. A key innovation of the system lies in its privacy- first approach. Uploaded files are securely processed, ensuring that user data remains confidential at all times. The integration of Google authentication further enhances user experience by enabling session continuity and persistent access to chat history. This commitment to privacy makes the chatbot especially suitable for users handling sensitive or proprietary travel documents. Keywords: AI chatbot, travel guide, Retrieval- Augmented Generation, natural language processing, user privacy, Mistral AI, document- based QA.
- Research Article
- 10.1121/1.4783172
- May 1, 2004
- The Journal of the Acoustical Society of America
Digital waveguide mesh (WGM) models have been shown to be a viable method of obtaining accurate room impulse responses (RIRs) for a virtual space. However, the large memory and long processing requirements of these models have restricted their use for large rooms and for those with nontrivial geometric features. The development and inclusion of a KW-pipe interface allows the interconnection of finite difference and wave-based mesh implementations of a WGM model. The former is efficient for the main body of the mesh, with the latter, more processor intensive method allowing more accurate simulation at the boundaries of the modeled space. The resultant hybrid model removes some of the above computational constraints, allowing larger rooms and more complex geometries to be modeled. Model visualizations and RIR data are presented, demonstrating the correct operation of the KW-pipe interface. RIR data from two differing mesh topologies show that nontrivial geometries can be successfully modeled using this technique, with considerable computational savings on purely wave-based WGMs. Comparisons with RIRs obtained through more traditional ray-tracing and image-source techniques show favorable results and demonstrate the complex wave phenomena that are inherent in WGM models. [Work supported by EPSRC.]
- Conference Article
10
- 10.2514/6.2001-1527
- Jun 11, 2001
Reduced-order models (ROM), which are based on the Volterra theory for nonlinear systems, for the evaluation of nonlinear unsteady aerodynamic forces are presented. The ROMs provide a means for rapid evaluation of frequency-domain generalized aerodynamic forces, which can then be used in traditional flutter analysis schemes to calculate flutter characteristics about nonlinear steady flows. Two ROMs are formulated, an impulse-type ROM that is based on the convolution of ROM kernels with the input signal, and a step-type ROM that is based on convolution with the derivative of the input signal. Linear, first-, and second-order kernels are identified for these two ROMs from direct CFD impulse and step responses. The ROM methodology is demonstrated with the heave and elastic modes of the AGARD 445.6 wing. It was found that the accuracy of the CFD-based impulse response is dependent on the choice of input amplitude and computational time step, and that the impulse-type kernels are highly sensitive to inaccuracies in the impulse responses used for their identification. The step-type ROM was found to be robust, and resulted in good predictions of direct responses. The introduction of second-order kernels did not significantly improve the predictions, indicating a difficulty in performing true nonlinear identification. The use of first-order step-type ROM offered a significant computational time saving compared to the full CFD frequency response analysis.
- Research Article
- 10.1515/snde-2024-0053
- Aug 7, 2025
- Studies in Nonlinear Dynamics & Econometrics
We examine the finite-sample accuracy of impulse responses obtained using local projections (LP) and vector autoregressive (VAR) models. In view of the fact that impulse responses are differences between multistep predictors, we propose to assess the relative performance of impulse-response estimators using tests for equal predictive accuracy. In our Monte Carlo experiments, LP-based and VAR-based estimators are found to be equally accurate in large samples under a mean-squared-error risk function. VAR-based estimators tend to have an advantage over LP-based estimators in small and moderately sized samples, particularly at long horizons.
- Research Article
- 10.1016/j.optcom.2011.11.078
- Dec 3, 2011
- Optics Communications
Optimizing receiver and transmitter layout in optical wireless multi spot configuration access link
- Research Article
18
- 10.1016/j.media.2022.102417
- May 1, 2022
- Medical Image Analysis
Morphological abnormalities of the femoroacetabular (hip) joint are among the most common human musculoskeletal disorders and often develop asymptomatically at early easily treatable stages. In this paper, we propose an automated framework for landmark-based detection and quantification of hip abnormalities from magnetic resonance (MR) images. The framework relies on a novel idea of multi-landmark environment analysis with reinforcement learning. In particular, we merge the concepts of the graphical lasso and Morris sensitivity analysis with deep neural networks to quantitatively estimate the contribution of individual landmark and landmark subgroup locations to the other landmark locations. Convolutional neural networks for image segmentation are utilized to propose the initial landmark locations, and landmark detection is then formulated as a reinforcement learning (RL) problem, where each landmark-agent can adjust its position by observing the local MR image neighborhood and the locations of the most-contributive landmarks. The framework was validated on T1-, T2- and proton density-weighted MR images of 260 patients with the aim to measure the lateral center-edge angle (LCEA), femoral neck-shaft angle (NSA), and the anterior and posterior acetabular sector angles (AASA and PASA) of the hip, and derive the quantitative abnormality metrics from these angles. The framework was successfully tested using the UNet and feature pyramid network (FPN) segmentation architectures for landmark proposal generation, and the deep Q-network (DeepQN), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and actor-critic policy gradient (A2C) RL networks for landmark position optimization. The resulting overall landmark detection error of 1.5 mm and angle measurement error of 1.4° indicates a superior performance in comparison to existing methods. Moreover, the automatically estimated abnormality labels were in 95% agreement with those generated by an expert radiologist.
- Research Article
328
- 10.1121/1.383069
- Jul 1, 1979
- The Journal of the Acoustical Society of America
When a conversation takes place inside a room, the acoustic speech signal is distorted by wall reflections. The room’s effect on this signal can be characterized by a room impulse response. If the impulse response happens to be minimum phase, it can easily be inverted. Synthetic room impulse responses were generated using a point image method to solve for wall reflections. A Nyquist plot was used to determine whether a given impulse response was minimum phase. Certain synthetic room impulse responses were found to be minimum phase when the initial delay was removed. A minimum phase inverse filter was successfully used to remove the effect of a room impulse response on a speech signal.
- Research Article
19
- 10.1121/1.4879664
- May 27, 2014
- The Journal of the Acoustical Society of America
Laser induced air breakdown is proposed as a sound source for accurate impulse response measurements. Within the audible bandwidth, the source is repeatable, broadband, and omnidirectional. The applicability of the source was evaluated by measuring the impulse response of a room. The proposed source provides a more accurate temporal and spatial representation of room reflections than conventional loudspeakers due to its omnidirectionality, negligible size and short pulse duration.
- Research Article
5
- 10.1016/j.apacoust.2007.12.001
- Jan 25, 2008
- Applied Acoustics
Reverberation time measurement by the product of two room impulse responses
- Research Article
- 10.1121/1.4782309
- May 1, 2007
- The Journal of the Acoustical Society of America
Simple computer modeling of impulse responses for small rectangular rooms is typically based on the image source method, which results in an impulse response with very high time resolution. The image source method is easy to implement, but the simulated impulse responses are often a poor match to measured impulse responses because the description of the source is often too idealized to match the real measurement conditions. For example, the basic image source method has often assumed the sound source to be an omni‐directional point source for ease of implementation, but a real loudspeaker may include multiple drivers and exhibit an irregular polar response in both the horizontal and vertical directions. In this paper, an improved room impulse response computer modeling technique is developed by incorporating the measured horizontal and vertical polar responses of the speaker into the basic image source method. Results show that compared with the basic image source method, the modeled room impulse response using this method is a better match to the measured room impulse response.
- Research Article
4
- 10.3390/e22111309
- Nov 17, 2020
- Entropy (Basel, Switzerland)
We introduce a Virtual Studio Technology (VST) 2 audio effect plugin that performs convolution reverb using synthetic Room Impulse Responses (RIRs) generated via a Genetic Algorithm (GA). The parameters of the plugin include some of those defined under the ISO 3382-1 standard (e.g., reverberation time, early decay time, and clarity), which are used to determine the fitness values of potential RIRs so that the user has some control over the shape of the resulting RIRs. In the GA, these RIRs are initially generated via a custom Gaussian noise method, and then evolve via truncation selection, random weighted average crossover, and mutation via Gaussian multiplication in order to produce RIRs that resemble real-world, recorded ones. Binaural Room Impulse Responses (BRIRs) can also be generated by assigning two different RIRs to the left and right stereo channels. With the proposed audio effect, new RIRs that represent virtual rooms, some of which may even be impossible to replicate in the physical world, can be generated and stored. Objective evaluation of the GA shows that contradictory combinations of parameter values will produce RIRs with low fitness. Additionally, through subjective evaluation, it was determined that RIRs generated by the GA were still perceptually distinguishable from similar real-world RIRs, but the perceptual differences were reduced when longer execution times were used for generating the RIRs or the unprocessed audio signals were comprised of only speech.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.