Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilising auditory anchors, and one using expert raters in a deconstructed timbre model with 5 specific dimensions. MethodsFour independent panels were conducted with separate cohorts of professional singing teachers. 41 assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as supplementary materials. Fleiss’ kappa values, descriptive statistics, and significance tests are reported for all panel assessments. ResultsPanel 1 through 4 varied in overall accuracy and agreement. The intuitionbased model showed overall 45% average accuracy (SD ±4%), k=0.289 (<0.001) compared to overall 71% average accuracy (SD ±3%), k=0.368 (<0,001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ±8%), k=0.54 (<0.001) compared with overall 83% average accuracy and agreement of k=0.63 (<0.001) for panel 4. Results revealed that highest accuracy and reliability was achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy. ConclusionDeconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accruacy and reliability. Panel assessors’ expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of 5 specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, non-musically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.