Posts

SpeechSuper English Speech to Text API Supports Inverse Text Normalization

Image
SpeechSuper released a new speech-to-text (speech recognition) API feature: inverse text normalization. What is inverse text normalization? It converts the words to numerical or scientific expressions for better readability and understanding. For example, the recognized words 'seventeen dollars' can be converted to '$17' via inverse text normalization. We now support inverse text normalization in the following domains. 1. Cardinal number and currency   SpeechSuper's English speech-to-text API can support number and currency conversion. For example, if the recognized words are 'It costs three hundred and one dollars.', it will be converted to 'It costs $301.' 2. Date SpeechSuper's English speech-to-text API can support date conversion. For example, if the recognized words are 'I was born on November first nineteen ninety-seven.', it will be converted to 'I was born on November 1, 1997'. 3. Decimal number SpeechSuper's English s

How Accurate is Pronunciation Assessment?

Image
Pronunciation assessment APIs usually offer pronunciation scores at phoneme level, syllable level, word level, and sentence level. Yet, how do you know if the scores are accurate? By comparing the predicted pronunciation scores of the testset with the golden standard (human label). The closer the two results, the more accurate the algorithm is.  To make the metric(s) representative and useful, we need to think carefully about 1) testset, and 2) performance metrics. Testset The testset is usually designed by an AI product manager. He/She should ensure that the testset can 1) reflect real user scenarios, 2) have good data variety, and 3) cover a wide range of use cases. For example, SpeechSuper's API testsets consist of masked audios from language learners sampled from our real user base. They usually cover a wide spectrum of phonetic combinations in a specific language. The testsets not only contain data recorded in a quiet environment but also with background noise. Metrics I guess

DON’T Use Speech Recognition in Language Learning Apps

Image
After researching ~100 language learning apps in South East Asia and the American app market, I found that only 27% allow users to practice speaking, and most of them use speech recognition as speaking feedback.  Using speech recognition is ineffective in language learning for two reasons. 1. Good pronunciations are all alike; every mispronunciation is faulty in its own way. One of the barriers to language learning is the mother tongue, especially for learners 12 years old and above. Deeply influenced by the sound system of their mother tongues, language learners may confuse sounds in their mother tongue with those in a new language. There are over 7000 languages globally, so a single language corresponds to a broad spectrum of mispronunciations from language learners.  It is happy if a recognition system can recognize speech, but what matters the most is how to deal with mispronunciations, which fail the recognition system. It can not shed light on where and how to improve, but those

SpeechSuper API Now Supports Mandarin Chinese Mispronunciation Detection

Image
SpeechSuper has long supported the assessment of Mandarin Chinese character pronunciation by scores. However, scores might be insufficient to give concrete instructions to users.  We're excited to announce that SpeechSuper recently launched the feature - the Mandarin Chinese mispronunciation detection. It spots users' mispronunciations of Chinese characters and returns if they mispronounced an A sound for a B sound, making feedback more specific. Here are two examples.  Example 1: A user was expected to read aloud "níu", but she said, "líu", confusing the initial "n" and "l" in pronunciation. SpeechSuper API found the error she mispronounced 'n' for 'l' with a confidence score of 100. Example 2: A user was expected to read aloud "shēng", but she said, "shēn", confusing the final "-eng" and "-en" in pronunciation. Click here to try it out. SpeechSuper API found the error she misprono

What is Speech Assessment and Why it Matters

Image
by Qiusi, Product Manager of SpeechSuper  What is Speech Assessment Speech assessment is the process of giving algorithm-based corrective feedback for speaking activities in language learning. It's also sometimes called pronunciation assessment. Speech Assessment 101 Speech assessment is largely based on the "Goodness of Pronunciation" (GoP) algorithm developed over 20 years. While automatic speech recognition (ASR) aims to decode the best possible option of acoustic model and language model combined derived from data in a specified language, GoP only cares about the acoustic model with the language model pre-defined, hence producing an acoustic likeliness score measuring “how similar a speech sounds like compared to native speakers of a particular language”, which, in short, is pronunciation. Simply put, for speech assessment, the input is speech audio + reference text, and the output is a score of pronunciation.   Evolution of Speech Assessment Over the past decades, th