Acoustics of vowels in Angami
This work investigates the acoustic properties of Angami vowels with the aim of definitively establishing the phonologically contrasting vowels in the language. Contrary to some previous studies that report seven monophthongs and multiple diphthongs, this study concludes that there are six monophthongs and two diphthongs in the language. The acoustic characteristics associated with the monophthongs and the diphthongs are explored and reported in this work. For both monophthongs acoustic characteristics, such as, the first three formants (F1, F2, F3), and duration were explored. For diphthongs, acoustic features, such as, the first two formants (F1, F2), their discrete cosine transforms (DCT) and duration were explored. The salient of the vowels in terms of their acoustic properties was substantiated by statistical analyses.
- Research Article
4
- 10.5664/jcsm.9292
- Apr 12, 2021
- Journal of Clinical Sleep Medicine
The aim of the study was to inspect the acoustic properties and sleep characteristics of a preapneic snoring sound. The feasibility of forecasting upcoming respiratory events by snoring sound was also investigated. Participants with habitual snoring or a heavy breathing sound during sleep were recruited consecutively. Polysomnography was conducted, and snoring-related breathing sound was recorded simultaneously. Acoustic features and sleep features were extracted from 30-second samples, and a machine learning algorithm was used to establish 2 prediction models. A total of 74 eligible participants were included. Model 1, tested by 5-fold cross-validation, achieved an accuracy of 0.92 and an area under the curve of 0.94 for respiratory event prediction. Model 2, with acoustic features and sleep information tested by Leave-One-Out cross-validation, had an accuracy of 0.78 and an area under the curve of 0.80. Sleep position was found to be the most important among all sleep features contributing to the performance of the 2 models. Preapneic sound presented unique acoustic characteristics, and snoring-related breathing sound could be deployed as a real-time apneic event predictor. The models, combined with sleep information, serve as a promising tool for an early warning system to forecast apneic events. Wang B, Yi X, Gao J, etal. Real-time prediction of upcoming respiratory events via machine learning using snoring sound signal. J Clin Sleep Med. 2021;17(9):1777-1784.
- Research Article
- 10.1080/21678421.2025.2598433
- Dec 13, 2025
- Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration
Objective: Bulbar dysfunction often diminishes the accuracy and speed of the tongue, lip, and jaw movements necessary for speech production. Vowel acoustic features derived from speech recordings can serve as sensitive markers of articulatory accuracy and movement timing. We examined whether degraded speech caused by amyotrophic lateral sclerosis (ALS), assessed through vowel acoustic features, was associated with communicative participation restrictions. As a secondary aim, we assessed the association of two global speech characteristics, rate and intelligibility, with vowel features and communicative participation. Materials & Methods: Thirty-three people with ALS (plwALS) recorded a reading passage and completed surveys using a smartphone application. Speaking rate and acoustic vowel features (duration, vowel articulation index [VAI]) were extracted from the recordings. Three speech-language pathologists rated speech intelligibility. Communicative participation was assessed using the Communicative Participation Item Bank (CPIB) short form. Bivariate correlation, partial correlation, and regression analyses were used to evaluate the associations between vowel features, intelligibility, speaking rate, and CPIB scores. Results: Significant bivariate correlations, ranging from rs = −0.39 to rs = 0.64, were found between speech variables and CPIB scores. A combined regression model including VAI, vowel duration, and sex explained 52% of the variance in CPIB scores. Including speaking rate or intelligibility in the partial correlation analysis attenuated the associations between vowel acoustics and CPIB. Conclusions: Vowel features and global dysarthria characteristics are linked to communicative participation in ALS. Clinical practices designed to target vowel production, speaking rate, and intelligibility may help to maintain daily communication in ALS.
- Conference Article
7
- 10.1109/retis.2015.7232917
- Jul 1, 2015
This paper presents an audio visual phoneme recognition system using the shape and appearance information extracted from jaw and lip region to enhance the robustness in noisy environment. Consideration of visual features along with traditional acoustic features have been found to be promising in complex auditory environment. Visual modality can provide complementary information to the speech recognizer when the audio modality is badly affected by background noise. Acoustic modality is represented by auditory based equivalent rectangular bandwidth (ERB) like wavelet features (WERBC) features, whereas visual modality is represented by statistically powerful active appearance model (AAM) based features. Audio and visual modalities are fused by using a proportional weighting factor to form the two stream audio visual synchronous Hidden Markov Model (SHMM) recognizer. The VidTIMIT database is chosen to study the performance of multi-modal phoneme recognition system. Artificial noises are injected to audio files at different SNR levels (0dB–20dB) to study the performance of system in noisy environment. Combination of WERBC and AAM features outperform the well known traditional combination of Mel scale cepstrum coefficients (MFCC) acoustic features and discrete cosine transform (DCT) visual features.
- Research Article
- 10.1121/10.0016294
- Oct 1, 2022
- The Journal of the Acoustical Society of America
Acoustic analysis of typically developing elementary school-aged (prepubertal) children’s speech has been primarily performed on cross-sectional data in the past. Few studies have examined longitudinal data in this age group. For this presentation, we analyze the developmental changes in the acoustic properties of children’s speech using data collected longitudinally over four years (from first grade to fourth grade). Four male and four female children participated in this study. Data were collected once every year for each child. Using these data, we measured the four-year development of subglottal acoustics (first two subglottal resonances) and vowel acoustics (first four formants and fundamental frequency). Subglottal acoustic measurements are relatively independent of context, and average values were obtained for each child in each year. Vowel acoustics measurements were made for seven vowels (i, ɪ, ɛ, æ, ʌ, ɑ, u), each occurring in two different words in the stressed syllable. We investigated the correlations between the children’s subglottal acoustics, vowel acoustics, and growth-related variables such as standing height, sitting height, and chronological age. Gender-, vowel-, and child-specific analyses were carried out in order to shed light on how typically developing speech acoustics depend on such variables. [Work supported, in part, by the NSF.]
- Conference Article
- 10.1109/icise.2010.5691842
- Dec 1, 2010
It's critical for feature recognition of DCT (Discrete Cosine Transform) coefficient selection in the new Data Field which commutated from the DCT; however, the DCT commutating does not have the capability for data compression. This paper proposes a new method combined with the DCT commutates and the statistical analysis to select DCT's coefficient. Firstly, calculate each capacity of DCT's coefficients. Secondly, select the data as a feature which has a higher discriminate ability, and then combined with the PCA method for data decorrelation. Finally, do the classification and recognition. Experiments show that this method of classification efficiency greatly improved compared to traditional methods.
- Conference Article
8
- 10.1109/ic3ina48034.2019.8949593
- Oct 1, 2019
Traditionally, automatic speech recognition (ASR) uses a Hidden Markov Model with Gaussian Mixture Model (HMM-GMM) as acoustic model and hand-designed features such as Mel-frequency Cepstral Coefficient (MFCC) as acoustic features. It is usually assumed that the features are uncorrelated, making it possible to use diagonal covariances for the GMM. The assumption generally holds due to the use of Discrete Cosine Transformation (DCT) that de-correlates the speech spectra. However, DCT could cause some information loss, such as correlations between the feature components. Current ASR systems, which is based on Deep Neural Network (DNN) show to be better especially in reverberant conditions when more primitive features, such as filter-bank (FBANK), are used. This might be because DNN is better in modeling non-linear relations between the components of the features. But the use of short-time processing in FBANK may cause the lost of long-term correlations in a speech pattern. To tackle this, we propose a new feature, q-FBANK, which is a generalization of FBANK. The results on artificially reverberant speech show that the proposed features achieve better performance than MFCC and FBANK on DNN-HMM systems where an average error reduction up to 39.73% and 13.5% were achieved respectively.
- Conference Article
7
- 10.21437/interspeech.2013-339
- Aug 25, 2013
Children’s vowel acquisition has long been examined on the basis of transcription-based evaluations of the accuracy rate of the vowel production in children before 5 years of age. This study examines the development of static and dynamic acoustic features in children between 3 and 7 years of age by comparing the acoustic features of children with those of adults. All acoustic analyses were based on the normalized formant frequency values to exclude the effect of different vocal tract size. The increasing compactness of individual vowel categories in the acoustic space evidenced the refinement of phonetic features in children in this age range. In addition, the vowel dispersion pattern of certain vowel plotted on the basis of formant frequency values at 5 temporal locations demonstrated positional change as well as differences in terms of the trajectory length. Results demonstrate that the acoustical development of vowels from children to adult norms is likely a long-term, graduate but not necessarily continuous process. Index Terms: speech development, monolingual English, vowel acoustics
- Research Article
- 10.15688/jvolsu2.2020.3.13
- Aug 1, 2020
- Vestnik Volgogradskogo gosudarstvennogo universiteta. Serija 2. Jazykoznanije
The article deals with the potential explicit or implicit impact of the cultural and scientific tradition on the way of thinking of the researchers in different epochs. The hypothesis is that aesthetic and philosophical thoughts may influence in some way the results of the scientific experiments. The paper follows the order of the research. It starts from the results of vowel acoustics measurements in 19th century and ends by finding their conceptual basis in works of the 17th century. Measuring vowel acoustic characteristics, researchers of the late 19th and early 20th centuries in most of cases used various sets of tuning forks. The brightest, i.e. characteristic tone of a vowel was defined by ear. The results of such experiments showed that there were intervals of one or more octaves between characteristic tones of various vowels. Among different factors that may lead to such results, beside the circumstances of the experiment itself, we suppose cultural, scientific and philosophical tradition. The analysis of the works of the authors who first explored acoustic characteristics of vowels, showed that the philosophy and aesthetics of the 17th century may have influenced directly or indirectly the descriptions of vowels during the following centuries. The idea of the main vowels of the Adam Alphabet may have had an impact on the acoustic instruments, while the idea of harmony and proportion, essential in time of the Scientific Revolution, may be found in vowel system descriptions up to the beginning of the 20th century.
- Research Article
10
- 10.3390/app122010642
- Oct 21, 2022
- Applied Sciences
Recently, piracy and copyright violations of digital content have become major concerns as computer science has advanced. In order to prevent unauthorized usage of content, digital watermarking is usually employed. This work proposes a new approach to digital image watermarking that makes use of the discrete cosine transform (DCT), discrete wavelet transform (DWT), dipper-throated optimization (DTO), and stochastic fractal search (SFS) algorithms. The proposed approach involves computing the discrete wavelet transform (DWT) on the cover image to extract its sub-components, followed by the performance of a discrete cosine transform (DCT) to convert these sub-components into the frequency domain. Finding the best scale factor for watermarking is a significant challenge in most watermarking methods. The authors used an advanced optimization algorithm, which is referred to as DTOSFS, to determine the best two parameters—namely, the scaling factor and embedding coefficient—to be used while inserting a watermark into a cover image. Using the optimal values of these parameters, a watermark image can be inserted into a cover image more efficiently. The suggested approach is evaluated in comparison with the current gold standard. The normalized cross-correlation (NCC), peak-signal-to-noise ratio (PSNR), and image fidelity (IF) are used to measure the success of the proposed approach. In addition, a statistical analysis is performed to evaluate the significance and superiority of the proposed approach. The experimental results confirm the effectiveness of the proposed approach in improving upon standard watermarking methods based on the DWT and DCT. Moreover, a set of attacks is considered to study the robustness of the proposed approach, and the results confirm the expected outcomes. It is shown by the achieved results that the proposed approach can be utilized for practical digital image watermarking, and that it significantly outperforms other digital image watermarking methods.
- Research Article
1
- 10.1121/1.4987607
- May 1, 2017
- The Journal of the Acoustical Society of America
Sound generated by impacts between raindrops and roof panels of vehicles is an important factor on automotive qualities when driving in rainy conditions. Therefore, analytical method to control this phenomenon is necessary. In this research, a theoretical model for predicting characteristics of sound radiation by droplet impacts was proposed. An experiment for measuring forces generated by falling droplets was conducted. The characteristics of the measured forces were investigated in the frequency domain. A measurement on a plate was performed to understand sound radiation formed by droplet impacts. Correlations between acoustic characteristics and properties of the plate were identified. A vibro-acoustic model was developed to analyze the experimental results. Assuming generation of sound sources at each location due to the vibrating plate, radiation of sound fields was theoretically calculated and verified by comparing with the measured results. Under single and multi-layered conditions, influence factors on acoustic properties were investigated based on the model. As a result, by using the proposed model, it is possible to predict acoustic mechanisms of vehicles due to raindrops and make them suitable for specific designs.
- Research Article
5
- 10.1007/s00435-021-00520-w
- Feb 27, 2021
- Zoomorphology
Fourier transform methods are usually adopted to fit 2D closed curves representing samples’ profiles to be studied in morphometric analysis. As for problems concerning 3D open curves, landmark-based methods are widely used. In this paper, a parametric method based on discrete cosine transform (DCT) is proposed to serve as a morphometric tool for 3D open curves. DCT transforms real signal (coordinates) into a combination of cosine functions. DCT describes the shape of a curve with coefficients generated from the fitting curve. Four examples are introduced to be fitted with DCT. The first example is 3D spiral curves with different shapes, added random disturbances to make this model more general. A curve alignment is also utilized to eliminate the non-shape effect. The other three examples of suture curves abstracted from 3D human skulls on which semilandmarks and landmarks are aligned with General Procrustes Analysis (GPA) to eliminate the effect brought by location, size, and orientation. These 3D curves with different diagnoses are matched with DCT. Coefficients generated in the fitting result are analyzed with between-group principal component analysis (bgPCA) and one-way permutational multivariate analysis of variance (PERMANOVA). Different groups of four examples are separated and present significant differences in the results of one-way PERMANOVA. Statistical analyses demonstrate that DCT is promising in morphometric analysis of 3D open curves.
- Conference Article
8
- 10.1109/cicn.2013.89
- Sep 1, 2013
This paper on 'COLOR IMAGE STEGANOGRAPHY IN DCT DOMAIN' portrays a new arena in steganography in color images in frequency domain or more precisely that in Discrete Cosine Transform (DCT) domain. In this paper the authors have proposed the exploration of a deft method for image-secret data-steg pass based sampling along with encryption and embedding in frequency domain with a variable bit retrieval function where the secret data becomes more secure by hiding with the steg password, so that without knowing the steg password one cannot get the secret data explored. The strong adhesion amongst the color images, secret data and steg password, varied with a pixel dependant embedding in DCT domain yields a highly protected and reliable substitution. A group of 8×8 quantized DCT Coefficient (QDC) is selected as our secret data carrier for the color image. The variable bit operation is applied to the proper QDCs to embed a byte of secret data, where the variable bit operation is dependent on the pixel value. Assiduous statistical analysis has been done and provided to emphasize the strong security of the algorithm to the various steganalysis methods. This will also help to reveal its huge carrier capacity and stego image quality.
- Research Article
- 10.1121/1.3508344
- Oct 1, 2010
- The Journal of the Acoustical Society of America
This study examines the acquisition of the American English vowel system by children who grew up in one of the three distinct dialect areas in the United States: western North Carolina, central Ohio, and southeastern Wisconsin. Of interest is the extent to which children acquire dialect‐specific vowel dispersion patterns and dynamic formant movements (vowel‐inherent spectral change). Ninety‐four children (8–12 years) and 93 adults (51–65 years), males and females, produced 13 vowel categories in isolated hVd words. The results for children clearly show the presence of dialect‐specific features found in adult speakers. The regional positional relations among vowels are generally maintained in children. For some vowels, the correspondence in the amount of spectral change and formant trajectory shape between children and adults is remarkable. Two major changes in children are common across dialects: reduction of formant movement in selected monophthongs and more uniform production of diphthongs in which dialectal differences present in adults tend to disappear. These results show that children are able to adopt and reproduce dialect‐specific acoustic vowel characteristics quite well despite their exposure to highly variable input, which creates opportunities for dialect convergence and standardization through interaction with mass media and telecommunication. [Work supported by NIH.]
- Research Article
32
- 10.1109/tcsvt.2016.2595320
- Jan 1, 2016
- IEEE Transactions on Circuits and Systems for Video Technology
This paper proposes a flexible and efficient implementation of the 2D $N$ -point discrete cosine transform (DCT) for the High Efficiency Video Coding (HEVC) standard. The DCT is implemented through the Walsh–Hadamard transform (WHT) followed by Givens rotations. This scheme is exploited to derive an adaptive algorithm, which allows computing of four different approximations ranging from the complete DCT to the WHT, by selectively skipping some rotations. This paper shows the statistical analysis of the DCT usage and derives a precomputation mechanism to adaptively skip rotations. Each approximation, referred to as a operating mode, is characterized by a large saving of operations, at the expense of very small quality loss. Then, two 2D-DCT architectures are proposed: the first one is totally unfolded, while the second one is folded. The two designs are finally synthesized with a 90-nm standard-cell library for a clock frequency of 250 MHz. Both architectures support real-time processing of 8K UHD video sequences at 64 and 26 fps, respectively, and show higher throughput and lower gate count compared with the state-of-art implementations. Moreover, power saving ranging from 28% to 56% can be achieved by working within the proposed operating modes.
- Book Chapter
4
- 10.1007/978-3-642-05036-7_14
- Jan 1, 2009
In recent time, ultrasound imaging is a popular modality for various medical applications. The presence of speckle noise affects difficulties on features extraction and quantitative measurement of ultrasound images. This paper proposes a new method to suppress the speckle noise while attempting to preserve the image content using combination of Gaussian filter and discrete cosine transform (DCT) approach. The proposed method, called quasi-Gaussian DCT (QGDCT) filter, is a quasi Gaussian filter in which its coefficients are derived from a selected 2-dimensional cosine basis function. The Gaussian approach is used to suppress speckle noise whereas the selected DCT approach is intended to preserve the image content. The filter will be implemented on the synthetic speckle images and the clinical echocardiograph ultrasound images. To evaluate the effectiveness of the filter, several quantitative measurements such as mean square error, peak signal to noise ration, speckle suppression index and speckle statistical analysis, are computed and analyzed. In comparison with established filters, results obtained confirmed the effectiveness of QGDCT filter in suppressing speckle noise and preserving the image content.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.