4W-05
Corpus Augmentation Based on Pseudo-Chinese Generation for Chinese-Japanese Neural Machine Translation
○魏 徴,関 洋平(筑波大)
The performance of neural machine translation model depends on the size of training datasets. This paper presents a novel corpus augmentation for Chinese-Japanese neural machine translation. Unlike conventional corpus augmentation methods, we perform corpus augmentation by generating pseudo-Chinese sentences directly from Japanese monolingual corpus. Rules are set up to replace particles and auxiliaries written in kana characters with Chinese kanji words that have the same meanings. In addition, using word embedding, nouns, verbs, adjectives written in kana characters are converted into words written in kanji characters with the similar meaning. We evaluate the proposed method using BLEU evaluation metric and compare it with back-translation method.