Abstract
We establish the viability of a streamlined architecture for pedagogically appropriate computer assisted pronunciation training (CAPT), to give second language learners automatic feedback about their mispronunciations. This takes advantage of end-to-end speech recognition models to detect mispronunciation in audio segments that correspond directly to orthographic letters, in contrast to standard mispronunciation detection using phone representations. Results in a classification task show the potential for similar sensitivity to non-nativelike phonetic errors in grapheme-aligned segments as in phone-aligned segments. Advantages of this approach over phone-based pronunciation scoring can include providing naturally comprehensible (orthographic, not phonemic) feedback to learners, being inherently open-vocabulary in the target language, and evaluating pronunciations with reference to a full range of target-language acoustic variants rather than a prespecified canonical phone sequence.
Original language | English |
---|---|
Pages (from-to) | 1004-1008 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2023-August |
DOIs | |
Publication status | Published - 20 Aug 2023 |
Event | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 |
Bibliographical note
Publisher Copyright:© 2023 International Speech Communication Association. All rights reserved.
Other keywords
- comprehensible feedback
- computer assisted pronunciation training
- forced alignment
- phone segmentation
- pronunciation error detection