What is Speech Assessment and Why it Matters

by Qiusi, Product Manager of SpeechSuper 

What is Speech Assessment

Speech assessment is the process of giving algorithm-based corrective feedback for speaking activities in language learning. It's also sometimes called pronunciation assessment.

  • Speech Assessment 101

Speech assessment is largely based on the "Goodness of Pronunciation" (GoP) algorithm developed over 20 years. While automatic speech recognition (ASR) aims to decode the best possible option of acoustic model and language model combined derived from data in a specified language, GoP only cares about the acoustic model with the language model pre-defined, hence producing an acoustic likeliness score measuring “how similar a speech sounds like compared to native speakers of a particular language”, which, in short, is pronunciation.

Simply put, for speech assessment, the input is speech audio + reference text, and the output is a score of pronunciation.  

  • Evolution of Speech Assessment

Over the past decades, the GoP algorithm has been evolving, having more features under the belt. Meanwhile, the acoustic model gains tremendous development thanks to deep learning techniques. It now supports speech assessment of various granularities, from phone, syllable, to sentence and paragraph. The feedback consists of plenty of segmental and suprasegmental aspects of speech production, for instance, phoneme mispronunciation, fluency, pause, rhythm, tone, linking, etc. Given massive training data, we could also assess various language aspects like vocabulary, grammatical errors, cohesion, coherence, relevance, etc. Almost all linguistic aspects that you can imagine could be analyzed from a continuous speech.  

Why it Matters

  • Language learning is ineffective without speaking activities.

Sufficient speaking activities are crucial in language learning. 

As an introverted English learner myself since 5 years old, I sincerely hoped that today's various speaking activities in language learning apps could help me out back in my childhood because I felt awkward speaking and practicing English in front of people. Without enough speaking practice in my critical period, speaking is the weakest among my four foundation language skills.

  • Language learners learn from feedback.

Speech assessment offers an eclectic range of feedback that language learners can learn from. 

Learners may be negatively influenced by their native language when learning a new language. Feedback is highly useful. I'm a native speaker of Mandarin Chinese, a syllable-timed language. English is my L2 language, a stress-timed one. Hence, I had a hard time laying stress within words and between words in reading aloud activities. With proper stress feedback, I corrected myself gradually to a proper rhythm. Stress detection is a feature that SpeechSuper APIs offer. Try our demo here

  • Speech assessment improves conversion and retention.

Successful language learning services leverage speaking assessment to engage users and stand out from competitors, like Rosetta Stone, Elsa, Duolingo, and so on. 

According to Elsa, with speaking activities and feedback, the conversion rate from free users to paid users would be 3 times as before. Besides, the user engagement will boost to 8 times with the average user session time being 23 minutes.  


Speaking activities are crucial and speaking assessment holds its value in the global trend of online learning. 

We think that the appropriate and wise use of speech assessment is still in its infancy. We are endeavoring to make more advances in speech assessment and looking forward to landing those progress to solve real problems in the realm of language learning.

At  SpeechSuper, we develop AI-based speech technologies to analyze speech from language learners, including pronunciation, fluency, completeness, and more. If you’re interested, please contact us on the website.


Qiusi is a product manager in China’s EdTech industry focusing on language learning and AI. She enjoys writing stories. You can reach her at qiusi.dong@speechsuper.com

SpeechSuper provides cutting-edge AI speech assessment (a.k.a pronunciation assessment or pronunciation score) APIs for language learning products. Comprehensive feedback covers pronunciation score, fluency, completeness, rhythm, stress, liaison, etc. Languages supported include English, Mandarin Chinese, French, German, Korean, Japanese, Russian, Spanish, and more.

*Prior written consent is needed for any form of republication, modification, repost, or distribution of the contents.


Popular posts from this blog

How Accurate is Pronunciation Assessment?

DON’T Use Speech Recognition in Language Learning Apps

SpeechSuper English Speech to Text API Supports Inverse Text Normalization