Posts

Showing posts from August, 2022

How Accurate is Pronunciation Assessment?

Image
Pronunciation assessment APIs usually offer pronunciation scores at phoneme level, syllable level, word level, and sentence level. Yet, how do you know if the scores are accurate? By comparing the predicted pronunciation scores of the testset with the golden standard (human label). The closer the two results, the more accurate the algorithm is.  To make the metric(s) representative and useful, we need to think carefully about 1) testset, and 2) performance metrics. Testset The testset is usually designed by an AI product manager. He/She should ensure that the testset can 1) reflect real user scenarios, 2) have good data variety, and 3) cover a wide range of use cases. For example, SpeechSuper's API testsets consist of masked audios from language learners sampled from our real user base. They usually cover a wide spectrum of phonetic combinations in a specific language. The testsets not only contain data recorded in a quiet environment but also with background noise. Metrics I guess