情報処理学会第85回全国大会 会期:2023年3月2日~4日 会場:電気通信大学

1W-04
音素に関する事前知識を埋め込んだアイヌ語 End-to-end 音声認識
○李 在詠,三村正人,河原達也(京大)
A method using explicit prior knowledge about phonemes is presented for improving automatic speech recognition (ASR) performance in low resource settings. First, a fixed-length encoding is defined for each phoneme, where each element of the encoding represents a distinct phonetic feature. Second, a phonetic feature prediction layer is put in a deep neural network (DNN) and the feature predictions are used to make the final token predictions. Experiments are conducted in multilingual settings where Ainu is the target low-resource language. Effectiveness and robustness of this method is explored with varying amounts of training data.