A Methodology to Convert a Digital Recorder into a Post-Processed Sound Level Metre and Spectrum Analyser
Abstract Acoustic studies are usually carried out using sound level metres and acoustic frequency analysers, which are classified and standardised according to international standards. However, some acoustic assessments cannot be carried out using conventional sound level metres and spectrum analysers, due to the need for covering large surfaces and many monitoring points, the requirement of simultaneously using numerous devices, the high cost derived from leaving these instruments outdoors, or their short autonomy. Moreover, in recent years, many authors have begun to use small digital recorders and even mobile phones for acoustic analysis. These devices are small, cheaper, and quite versatile and greatly autonomous, which make them appropriate for those acoustic studies where sound level metres are not the best option. The methodology presented in this paper proposes the use of small digital recorders and WAV file post-processing software to replace the use of sound level metres and spectrum analysers, obtaining results with a precision comparable to that of a Class 1 sound level metre and acoustic analyser.
- Research Article
7
- 10.5664/jcsm.9292
- Apr 12, 2021
- Journal of Clinical Sleep Medicine
The aim of the study was to inspect the acoustic properties and sleep characteristics of a preapneic snoring sound. The feasibility of forecasting upcoming respiratory events by snoring sound was also investigated. Participants with habitual snoring or a heavy breathing sound during sleep were recruited consecutively. Polysomnography was conducted, and snoring-related breathing sound was recorded simultaneously. Acoustic features and sleep features were extracted from 30-second samples, and a machine learning algorithm was used to establish 2 prediction models. A total of 74 eligible participants were included. Model 1, tested by 5-fold cross-validation, achieved an accuracy of 0.92 and an area under the curve of 0.94 for respiratory event prediction. Model 2, with acoustic features and sleep information tested by Leave-One-Out cross-validation, had an accuracy of 0.78 and an area under the curve of 0.80. Sleep position was found to be the most important among all sleep features contributing to the performance of the 2 models. Preapneic sound presented unique acoustic characteristics, and snoring-related breathing sound could be deployed as a real-time apneic event predictor. The models, combined with sleep information, serve as a promising tool for an early warning system to forecast apneic events. Wang B, Yi X, Gao J, etal. Real-time prediction of upcoming respiratory events via machine learning using snoring sound signal. J Clin Sleep Med. 2021;17(9):1777-1784.
- Research Article
89
- 10.1097/moo.0b013e32834575fe
- Jun 1, 2011
- Current Opinion in Otolaryngology & Head & Neck Surgery
This paper reviews recent evidence regarding the validity and reliability of acoustic voice analysis in routine clinical assessments. The current role of jitter and shimmer, the most-used indices, and how their clinical application might be improved are evaluated. Even though the evidence is limited, acoustic analysis is widely used to assist differential diagnosis, documentation and evaluation of treatment for clinical voice disorders. Recent clinical data have not shown that jitter and shimmer are absolute or independent indices of voice pathology or perceptual hoarseness. However, in pretreatment and posttreatment comparisons within patients, acoustic analysis might have value as an outcome measure.Yet, the true value of clinical acoustic analysis might be masked by the confounding effects due to assessment system, gender, vowel and especially speaking voice intensity. The validity of acoustic assessments in clinical applications remains unproven. Measurement reliability is still limited and might be greatly improved with relatively simple changes and consensus in measurement protocols and techniques. For instance, clinical assessment procedures and current normative values would have to be revised considering gender and vowel. Thus, future research might establish the validity and potential of clinical acoustic assessments.
- Research Article
4
- 10.1049/ip-j:19860007
- Feb 1, 1986
The paper reports on a theoretical study of the two-Bragg-cell interferometric spectrum analyser system. This architecture offers an improvement in radio frequency (RF) dynamic range compared to that of the conventional power spectrum analyser. Bulkwave shear mode lithium niobate Bragg cells are assumed for the study together with shot noise limited detection by avalanche photodiodes. A theoretical model is presented which enables the temporal history of the intermediate frequency (IF) output on each channel of the detector array to be predicted. A time-domain model is employed in which the instantaneous Fourier transforms of the (Gaussian weighted) signal and reference waveforms are evaluated at intervals much less than the IF period. These are then coherently combined to simulate the heterodyne detection. A computer program based on this theory provides a realistic simulation of pulse responses, ringing and delay effects in the IF filter, image and sidelobe levels, and IF breakthrough due to the reference waveform. Results are presented for chirp, pseudonoise and Gaussian noise reference waveforms. It is concluded that for the detection of RF pulses with durations as short as 100 ns, an instantaneous dynamic range of 50?55 dB relative to rms noise should be achievable, for simultaneous signals.
- Research Article
- 10.1111/anec.70105
- Aug 15, 2025
- Annals of Noninvasive Electrocardiology
ABSTRACTIntroductionElectrical cardioversion (ECV) remains a treatment option for atrial fibrillation (AF). The study aimed to find predictors of SR maintenance after ECV using spectral and vector cardiographic (VCG) analysis of ECGs.MethodsConsecutive patients with AF referred for elective ECV were prospectively enrolled. A digital ECG recording was obtained before the ECV and was analyzed using spectral and VCG analysis. AF activity was analyzed using spectral analysis to determine the dominant frequency (DF), RI (regularity index), and OI (organizational index). QRS complexes were analyzed using vectorcardiography to determine the dXmean, dYmean, and dZmean (derivation of VCG signals). We used Lasso Logistic Regression (LLR) in five‐fold cross‐validation for feature selection and to build combined predictive models of SR maintenance. For model training and evaluation, data were split in a 60%–40% ratio for training and testing, respectively.ResultsA total of 80 patients were enrolled (age 70.2 ± 10.6 years, 49 (61%) were men, BMI 29.7 kg/m2). At the 3‐month follow‐up, AF recurrence was present in 36 patients (45%). The best single VCG parameter to predict SR maintenance was dZMean (OR 0.18, 95% CI 0.06–0.51, p < 0.001). VCG‐domain parameters combined into the LLR model showed an area under the curve (AUC) of 0.78. From the spectral analysis domain, the best predictor was DF (OR 3.54, 95% CI 1.28–10.25), p = 0.006; spectral features led to an AUC of 0.76 when combined in the LLR model. Clinical features did not form a model since no features passed feature selection. Combining VCG and spectral analysis features led to an LLR model with an AUC of 0.79.ConclusionThe combination of spectral analysis of AF activity and VCG analysis of ventricular activity provided more accurate predictive information than either analysis alone.
- Research Article
9
- 10.1007/s12094-008-0175-z
- Mar 1, 2008
- Clinical and Translational Oncology
Radiotherapy for early vocal cord carcinoma affects quality of voice. Nevertheless, most patients refer to having a high satisfaction level with their voice. The few acoustic studies on quality of voice have been performed only in prolonged vowel production, which is not a usual speech situation. The present study has been done with the aim of establishing which phonetic situations reflect a greater alteration in voice production related to irradiation. Eighteen male patients irradiated for Tis-T1 vocal cord carcinoma and a control group of 31 non-irradiated subjects were included in a study of acoustic voice analysis. This analysis was performed one year after radiotherapy. Patients and control group voices were tape recorded in extended vowel production, oral reading of a standard paragraph, spontaneous speech and in a song. Acoustic analysis was performed by a Kay Elemetric's Computerized Speech Lab (model CSL #4300). Fundamental frequency, jitter, shimmer and harmonics-to-noise ratio were obtained in both groups. Statistical test: Lin concordance coefficient and Pearson's correlation coefficient, Student's t-test and ROC curves. Concordance and correlation studies did not allow selection of any subgroup in acoustic parameters and different acoustic situations. Acoustic parameters had higher median values in irradiated patients. Student's t-test showed significant differences for fundamental frequency in sustained vowel production and spontaneous speech; for jitter there was statistical significance in all the acoustic situations and for shimmer in oral reading and song. Jitter showed a cut-off of 2.02% with a sensitivity of 89% and specificity of 97% in classifying irradiated and non-irradiated groups. The ROC curve for jitter correctly classified 94% of subjects into irradiated or non-irradiated groups. The present study showed that jitter obtained from spontaneous speech was the most relevant parameter in discriminating voice in irradiated patients by acoustic analysis. Jitter in spontaneous speech is in need of more analysis in bigger series and in more advanced stages of larynx cancer as its relevance has been demonstrated.
- Research Article
1
- 10.1044/ssod21.2.1-ce
- Oct 1, 2011
- Perspectives on Speech Science and Orofacial Disorders
You have accessPerspectives on Speech Science and Orofacial DisordersCE Questions1 Oct 2011SIG 5 Perspectives Vol. 21, No. 2, October 2011Earn 0.1 CEUs on This Issue James M. Hillenbrand James M. Hillenbrand Google Scholar More articles by this author https://doi.org/10.1044/ssod21.2.1-ce SectionsAboutPDF ToolsAdd to favoritesDownload CitationTrack Citations ShareFacebookTwitterLinked In Hillenbrand: Acoustic Analysis of Voice: A Tutorial 1. A challenge associated with use of the airborne acoustic signal to study vocal function is that it is quite distinct from the signal that enters the ear. it requires the use of very expensive, sophisticated laboratory equipment. it strongly emphasizes the lower frequency components of the voice signal. it not only reflects laryngeal function, but also resonant characteristics of the vocal tract. 2. A disadvantage of performing acoustic analysis on monotone sustained vowels is that analysis results do not necessarily generalize to connected speech. is that it introduces problems associated with individual variability in speaking style. stems from a lack of acoustic analysis approaches that may be used with this vocal task. is that sustained vowels are technically difficult to record. 3. Jitter or pitch perturbation is defined as an estimate of the cycle-to-cycle fluctuation in the amplitudes of adjacent pitch pulses. estimates the relative energy located in the high frequency regions of the sound spectrum. is one of many measures that reflect signal periodicity. has a single, well-defined method of calculation uniformly implemented within the professional discipline. 4. A measure that is strongly correlated with listener ratings of breathiness is Cepstral Peak Prominence (CPP). fundamental frequency. jitter. vocal intensity. 5. A challenge associated with using acoustic measures to infer underlying laryngeal pathology is that we have a very poor understanding of laryngeal pathology. audio recording equipment does not currently have the fidelity to accurately record pathological voice qualities. there are currently no acoustic measures that correlate with perceptual ratings of voice. different laryngeal pathologies can exhibit similar acoustic features such as turbulent noise. Callahan Mandulak: “I Can See What You're Saying”: Clinical Utility of Spectral Moment Analysis 6. Spectral moment analysis can distinguish between [s] and [f] due to the degree of categorical distinction between sibilant and nonsibilant fricatives. the distinctive spectral shapes of sibilant fricatives versus nonsibilant fricatives. the difference in appearance of sibilant and nonsibilant fricatives on a spectrogram. the relationship between auditory-perceptual analysis of sibilant versus nonsibilant fricatives. 7. The consonant [s] has higher frequency noise energy compared to [ʃ] because periodic sound waves contain higher frequency noise compared to aperiodic noise. articulation of [s] requires a sublingual space to be created for resonance of aperiodic noise. [ʃ] is produced posterior to [s], reflecting distinct articulatory configurations and therefore differing acoustic output. increased airflow for [s] production increases the frequency of noise contained in the fricative. 8. A time-history plot displaying fricative production allows for inspection of qualitative details regarding individual speaker's speech sound patterns, including duration or degree of distinction. comparison of group data for the purposes of determining statistically significant differences. documentation of speech outcomes from a quantitative standpoint. objective computation of fricative duration. 9. It is important to inspect data such as range of performance for an individual speaker's speech production skills in addition to group data because individual speakers often behave differently than the group average. it is not important to investigate the range of normal performance. adult speakers all produce the fricatives [s] and [ʃ] with similar dynamic patterns. individual speaker data is not as important as group data. 10. Which of the following statements is true about spectral moment analysis and children with repaired cleft palate? Spectral moment analysis would not be useful to measure pre- and post-intervention progress on specific obstruent consonant targets. Children with repaired cleft palate have velopharyngeal dysfunction and hypernasal resonance, which can be objectively measured with spectral moment analysis. The turbulence produced by fricative consonants is not typically distorted in children with repaired cleft palate, and therefore spectral moment analysis would not be adequately utilized for speech assessment. Spectral moment analysis provides an objective measure of articulation skill. Additional Resources FiguresReferencesRelatedDetails Volume 21Issue 2October 2011Pages: C1-C2 History Published in issue: Oct 1, 2011 Get Permissions Add to your Mendeley library Metrics Topicsasha-topicsasha-sigsasha-article-typesCopyright & Permissions© 2011 American Speech-Language-Hearing AssociationPDF downloadLoading ...
- Research Article
8
- 10.21037/jtd-23-115
- Feb 1, 2023
- Journal of thoracic disease
Unlike the conventional spectral analyses of spectral computed tomography (CT) that cannot fully represent the whole lesion, the volumetric quantitative analysis reveals the information of the whole lesion and is of more accurate. So this study sought to evaluate the value of volumetric quantitative analysis in the differential diagnosis of pulmonary adenocarcinoma (ADC) and squamous cell carcinoma (SQCC). Fifty-seven patients with lung cancer confirmed by pathology, including 35 ADC and 22 SQCC patients, were retrospectively analyzed. Calcium concentration and effective-Z (Eff-Z) in plain scan (PS), iodine concentration, and water concentration in the arterial phase (AP) were measured. The Student t-test or rank-sum test was used to determine the statistically significant parameters. Receiver operating characteristic (ROC) curve was used, and the corresponding area under the curve (AUC), sensitivity and specificity was calculated to evaluate the diagnostic efficacy in differential diagnosis of ADC and SQCC. In the volumetric quantitative analysis of spectral CT, the concentration of calcium [(6.97±2.83) mg/cm3], Eff-Z (7.90±0.14), and iodine [1.42 (0.84) mg/cm3] was significantly higher in ADC than SQCC [(5.14±2.39) mg/cm3, (7.80±0.10), 1.16 (0.65) mg/cm3, t=2.513, 2.860, Z=-2.246, P=0.015, 0.006, 0.025], but the concentration of water was significantly lower in ADC [995.00 (38.70) mg/cm3] than SQCC [1,007.00 (14.38) mg/cm3, Z=-2.082, P=0.037]. Moreover, whether it's ADC or SQCC, the concentrations of calcium [(8.51±4.28) mg/cm3, (5.96±2.50) mg/cm3], Eff-Z (7.97±0.20, 7.86±0.13), and water [1,007.00 (14.38) mg/cm3, 1,029.28 (10.49) mg/cm3] were lower in the volumetric spectral analysis than the conventional spectral analysis, while the concentration of iodine [1.33 (0.80) mg/cm3, 0.94 (0.63) mg/cm3] was significantly higher in the volumetric spectral analysis than the conventional spectral analysis. The ROC curve analysis showed that the areas under the curves (AUC) (0.76, 0.76, 0.75, 0.71), sensitivity (66.7%, 66.7%, 66.7%, 85.2%), and specificity (92.3%, 84.6%, 86.9%, 69.2%) of the volumetric spectral analysis parameters for the differential diagnosis of ADC and SQCC were higher than those of the conventional spectral analysis [(0.65, 0.66, 0.73, 0.63), (44.4%, 48.1%, 59.3%, 66.7%), (69.2%, 69.2%, 84.6%, 53.8%)] parameters. The volumetric quantitative analysis has a promising advantage in the observation range of whole lesions, it may be invaluable in the differential diagnosis of ADC and SQCC, and is worthy of clinical recommendation.
- Supplementary Content
1
- 10.1002/wjo2.70015
- Apr 8, 2025
- World Journal of Otorhinolaryngology - Head and Neck Surgery
ABSTRACTObjectiveTo discuss the current clinical application and usefulness, shortcomings and future directions of traditional and artificial intelligence (AI)‐driven acoustic assessment techniques to detect voice dysfunction.Data SourcesLiterature review.ConclusionAI‐based acoustic voice analysis techniques have huge potential to improve the early recognition, diagnosis, and tracking of treatment success in patients with voice disorders or diseases affecting voice function. Through smartphones, wearable devices, and server‐based solutions, acoustic voice assessment techniques have become widely available and may be extended to workplace and private settings. However, the transformative potential is thwarted by several limitations including a lack of (a) consistent data collection and reporting standards, leading to heterogeneity of current databases and literature; (b) characterization what acoustic analysis techniques including AI can detect or track reliably, and whether the derived outcomes serve as a reliable marker of dysfunction, pathology, or an improvement thereof; (c) clinical validation studies in unselected patients; and (d) ethical and legal controversies. Thus, substantial effort to research, define and establish guidelines for the collection, storage, and processing of acoustic data and valid clinical applications is warranted to design sensible strategies for analysis and use.
- Research Article
- 10.1111/1460-6984.70239
- Feb 1, 2026
- International journal of language & communication disorders
Myotonic dystrophy type 1 (DM1) is a heterogeneous neuromuscular disorder characterized by progressive muscle weakness and myotonia. Dysarthria is a known symptom of DM1, but literature is lacking about the patient's own perception in relationship to dysarthria characteristics and severity. The aim of the study was to describe the acoustic speech characteristics of dysarthria in patients with DM1, examine the perceptually determined dysarthria severity through speech and language therapy assessment, gather subjective evaluations of speech and intelligibility from patients and relatives and examine the relationship between these outcomes. The speech of 22 adult patients with DM1 (nine females) was acoustically assessed during spontaneous speech, reading, and maximum performance tasks and analysed using the Praat-software. Dysarthria severity was rated on a severity scale from 0 (no dysarthria) - 5 (very severe dysarthria/anarthria). Patients and relatives rated the speech with a short questionnaire and a visual analogue scale (VAS). Acoustic analysis showed a deviant speech rate (SR), articulation rate (AR), maximum phonation volume (MPV), and fundamental frequency range compared to normative values. Perceptually, the dysarthria severity scores varied between 1 (minimal dysarthria) and 4 (severe dysarthria). In more severe dysarthria, SR, AR, and MPV decreased. Patients were sufficiently satisfied about their speech, with no relationship to dysarthria severity. However, the scores of relatives decreased when perceptual dysarthria severity increased. As dysarthria severity increased, speech quality and intelligibility declined, particularly when assessed by speech therapists and relatives. Patients with DM1 generally reported minimal conversational restrictions due to dysarthria. Multidimensional measurements may improve the understanding of speech impairment in DM1. Self-awareness should be a topic in speech therapy interventions. What is already known on this subject Dysarthria is a common symptom in myotonic dystrophy type 1 (DM1). Research has focused on articulatory accuracy and SR. Cognitive decline in patients with DM1 is known to reduce illness insight in approximately half of the cases. However, its impact on self-awareness of dysarthria, and how this compares to perceptions of relatives and speech therapists, had not been investigated prior to this study. The combination of acoustic and perceptual speech measures with patients' own perspectives provides new knowledge of speech monitoring in DM1. What this study adds to existing knowledge This is the first study to investigate speech in patients with DM1 using both acoustic and perceptual analyses, while simultaneously examining the relationship with patients' self-awareness of their speech impairments. The results of this study show a relationship between increasing perceptual dysarthria severity and decreasing acoustic speech performance. Patients often rate their own speech more optimistically than relatives and speech therapists, which suggests a possible role of neuropsychological impairment in reduced self-awareness of speech deficits in DM1, warranting further research. What are the potential or actual clinical implications of this study? The integration of acoustic and perceptual speech assessment allows an objective and accurate diagnosis of dysarthria in DM1. Speech therapy should focus on improving speech clarity and intelligibility by developing exercises for articulatory precision and respiratory control. The discrepancy between how patients perceive their own speech and the opinion of clinicians and relatives suggests that speech therapists may need to focus on raising awareness of the dysarthria and actively involve relatives in the therapeutic process.
- Conference Article
1
- 10.4043/5332-ms
- May 5, 1986
A combination of spectral and probabilistic analysis techniques is proposed to estimate the fatigue damage of jackets subjected to intermittent wave loading. The resulting non-Gaussian response process is considered to be a mixture of Gaussian and shifted exponential. It is shown that the peak density deviates significantly from the Rayleigh, and Gaussian response assumption is unconservative for fatigue damage evaluation at higher sea states. INTRODUCTION Spectral analysis techniques are commonly used to predict the fatigue behavior of offshore jacket-platforms. The cyclic wave loading is assumed to be a Gaussian random process. Assuming the structure is behaving linearly, the structural response can be considered to be Gaussian and spectral analysis will thus fully define the stress processes at the joints. However, recent field observations show that the response is a non- Gaussian process. Nonlinearity in the drag and in the wave kinematics and the phenomenon of free surface fluctuation near the mean sea level in the splash zone are the major reasons for this non-Gaussian behavior. The effect of nonlinearity in the drag loading on the fatigue life of jacket-platforms was studied in the past (1, 2). A method is proposed here to estimate fatigue life considering the nonlinearity in wave kinematics and the intermittent nature of the wave loading. Since conventional spectral analysis with information on the second moment is insufficient to describe the probability distribution characteristics of the response, a procedure must be developed to estimate higher-order moments. Furthermore, when the response is non-Gaussian, the probability density of the stress peaks will not be Rayleigh, and the estimates of cumulative damage are expected to be significantly different. In the proposed method, Stokes1 second order wave theory is used to consider the nonlinearity in the wave loading. Intermittent wave loading is modeled by using a Heaviside step function. For lower sea states (sea-state 1 for the example considered here), the first two moments of the response are calculated using spectral analysis and the third and fourth moments are calculated using probabilistic methods assuming the response in this case is resonance- dominated. For higher sea states (sea states 2 through 7 for the example considered here), the response is considered to be quasi-static. The second moment is calculated using the modified spectral analysis method. The first, third and fourth moments of the response are estimated using the expected value of the first, third and fourth power of the response functions, which are related to the load via flexibility coefficients. Using the third and fourth central moments, the optimal marginal distribution of the response is estimated as a mixture of Gaussian and shifted exponential distributions. The level crossings of such a stress process are estimated by considering it to be a translation process. The double inversion technique is used to map a Gaussian process into the response process. The probability density function of the peaks is calculated numerically. It is shown with an example that the probability density of the stress peaks differs significantly from the commonly assumed Rayleigh, and the fatigue damage estimates are found to be unconservative at higher sea states with traditional spectral analysis.
- Research Article
1
- 10.1515/libri-2023-0109
- Jun 13, 2024
- Libri
The digital divide, in my view, is not only created due to the lack of local contents but also due to the fact that few digital contents from local sources are found in the digital libraries of higher education in Ethiopia. Likewise, the creation and dissemination of local contents could be facilitated not only by the presence of local digital contents but also by the presence of usable digital technology at a local level. This article aims to give a practical solution to bridging the digital divide by unlocking local heritage knowledge through creating digital contents from locally grown literary heritage as well as by developing a localised digital library system. Therefore, this article presents the research processes and results that were undertaken to unlock local heritage knowledge and developed the localised digital library system: customising a free and open-source software; digitising and translating the local literary heritage contents; and building the digitised and translated literary heritage contents into the localised digital libraries, which Greenstone digital library software was used to customise into local use. The English version of the Greenstone user interface (macro files) was translated into the Tigrinya language, one of the locally spoken languages in Ethiopia. For translation purposes, a list of suitable and compatible Tigrinya words and phrases that basically fit with the meaning of the English version of the Greenstone spreadsheet was developed. As a result of this translation work, the Tigrinya language interface has become one of the languages that are included in Greenstone digital library software version 2.83 for the first time (Language short name=ti “long name=ትግርኛ (Tigrinya)” default encoding=utf-8). To unlock the heritage knowledge and build the localised digital library with local digital collections, a sample of Ethiopia’s ancient Ge’ez parchment manuscripts were digitised and translated into Tigrinya and English languages. To facilitate the retrieval of information and to be easily recognised by internet search engines in the Tigrinya language, a local specific metadata standard for Ethiopia’s ancient Ge’ez parchment manuscripts was developed at three hierarchical levels, at manuscript level, at chapter level and at page level, with each translated page tagged using HTML. To facilitate the link between the translated text and the corresponding digital image, an “item” file was created using a WordPad. As a result, three collections were built into the customised digital library: the digitised image of the Abushakir manuscript as well as the Tigrinya and English translated texts of the same manuscripts. The functionality and usability of the localised digital system was tested by searching keywords and browsing titles from the built collection of the Tigrinya text and the original digital image of the manuscript. The result of this test shows that the localised digital library system is capable of allowing end-users to discover the information they want at the granular level from digital content of the local literary heritage. Therefore, further manuscript collection through digitisation, translating into local language and building the digitised collection into this localised digital library system is necessary for wider access to the local literary heritage digital content and for bridging the digital divide in the long-term.
- Conference Article
46
- 10.1109/icassp.2016.7472924
- Mar 1, 2016
This paper describes the application of state-of-the-art automatic speech recognition (ASR) systems to objective assessment of voice and speech disorders. Acoustical analysis of speech has long been considered a promising approach to non-invasive and objective assessment of people. In the past the types and amount of speech materials used for acoustical assessment were very limited. With the ASR technology, we are able to perform acoustical and linguistic analyses with a large amount of natural speech from impaired speakers. The present study is focused on Cantonese, which is a major Chinese dialect. Two representative disorders of speech production are investigated: dysphonia and aphasia. ASR experiments are carried out with continuous and spontaneous speech utterances from Cantonese-speaking patients. The results confirm the feasibility and potential of using natural speech for acoustical assessment of voice and speech disorders, and reveal the challenging issues in acoustic modeling and language modeling of pathological speech.
- Research Article
7
- 10.5604/01.3001.0013.7850
- Feb 5, 2020
- Otolaryngologia Polska
<b>Introduction:</b> Treatment of glottis cancer, despite oncological safety, should consider postoperative voice quality. CO<sub>2</sub> laser endoscopic cordectomy allows radical removal of the tumor while maintaining respiratory, defensive and phonatory functions. <br><b>The aim:</b> The aim of the study is perceptual and acoustic evaluation of voice in patients after endoscopic CO2 III-Va laser cordectomy due to glottis cancer. <br><b>Material and method:</b> The study included 30 men after CO<sub>2</sub> cordectomy. 13 (43%) patients underwent type III cordectomy, 6 (20%) - type IV; 11 (37%) - type Va. Voice quality has been assessed 6 months after the surgery. Control group included 30 healthy men of the same age. GRBAS scale has been used in perceptual evaluation of voice. Acoustic analysis has been performed using DiagnoScope Specjalista software. Narrowband spectrography and Maximum Phonation Time (MPT) measure has been performed. <br><b>Results:</b> In study group, voice has been classified as G<sub>1</sub>R<sub>1</sub>B<sub>0</sub>A<sub>0</sub>S<sub>0</sub> after type III cordectomy; as G<sub>1</sub>R<sub>1</sub>B<sub>1</sub>A<sub>1</sub>S<sub>2</sub> in type IV and as G<sub>2</sub>R<sub>1</sub>B<sub>1</sub>A<sub>0</sub>S<sub>3</sub> in type Va. Acoustic evaluation revealed the highest values of F0, Jitter, Shimmer and NHR after Va cordectomy as well as non-harmonic components in narrowband spectrography and reduction of MPT. <br><b>Conclusions:</b> Postoperative voice quality depends on the type of cordectomy. Perceptual assessment indicates that type IV and Va cordectomy cause intensification of voice disorders. Parameters of acoustic evaluation increase with the extent of the procedure. The presence of non-harmonic components in narrowband spectrography increases with the extent of cordectomy, such as the reduction of MPT. Preservation of anterior commissure influences good voice quality in perceptual and acoustic assessment.
- Research Article
13
- 10.1016/j.forsciint.2013.02.020
- Mar 5, 2013
- Forensic Science International
Time and spectral analysis methods with machine learning for the authentication of digital audio recordings
- Research Article
6
- 10.1017/s0022215110000782
- May 5, 2010
- The Journal of Laryngology & Otology
To assess whether different compact disk recording protocols, used to prepare speech test material, affect the reliability and comparability of speech audiometry testing. We conducted acoustic analysis of compact disks used in clinical practice, to determine whether speech material had been recorded using similar procedures. To assess the impact of different recording procedures on speech test outcomes, normal hearing subjects were tested using differently prepared compact disks, and their psychometric curves compared. Acoustic analysis revealed that speech material had been recorded using different protocols. The major difference was the gain between the levels at which the speech material and the calibration signal had been recorded. Although correct calibration of the audiometer was performed for each compact disk before testing, speech recognition thresholds and maximum intelligibility thresholds differed significantly between compact disks (p < 0.05), and were influenced by the gain between the recording level of the speech material and the calibration signal. To ensure the reliability and comparability of speech test outcomes obtained using different compact disks, it is recommended to check for possible differences in the recording gains used to prepare the compact disks, and then to compensate for any differences before testing.