Speech audiometric tests have been widely used for advanced hearing diagnoses and in rehabilitation. However, there are no standardised speech tests for more than 90% of the world's population, who do not speak English. A major problem in the design of a speech audiometric test is that the selection of test materials is subject to multiple criteria, and its complexity rises dramatically as the structure of test items changes from phonemic or monosyllabic forms to disyllabic or polysyllabic forms. A genetic algorithm is presented that can automatically select a set of disyllabic words from a large Mandarin corpus. The selection accords with the following principal criteria for the items constituting a speech discrimination test: similarity in structure, familiarity to the subjects, and a phonemically balanced composition. The performance of the genetic algorithm was evaluated by computation of the distance between a target vector, specifying the desired distribution of initial and final syllables and tone patterns for daily disyllabic word usage, and the vector derived by the search results of the algorithm. The use of the genetic algorithm was illustrated by its application to the selection of test lists from two Mandarin corpora. The results showed that, for a given corpus, at least 12 disyllabic word lists with a distance of less than 20 could be generated within 72 h. The genetic algorithm performed an efficient, robust and low-complexity search of the problem space and can be easily modified to adapt to the material selection of other languages.
Read full abstract