情報処理学会第88回全国大会

Natural Language Inference (NLI) models often rely on dataset-specific artifacts rather than true semantic understanding, which results in models achieving high accuracy through spurious shortcuts as opposed to genuine reasoning. In this work, we investigate this issue by training an ELECTRA-small model on the Multi-Genre Natural Language Inference (MNLI) dataset and constructing a contrast set using Linguistically-Informed Transformations (LIT) method. The model achieves strong performance on standard test sets yet substantially degrades on the contrast sets, confirming its dependence on superficial patterns. To address this, we adopt a minimax optimization strategy, aiming to improve robustness by forcing the model to generalize beyond artifacts. We incorporate contrast set examples during training so that the minimax objective explicitly targets challenging cases. Despite utilizing minimal contrastive examples, this approach achieves performance improvements on contrast evaluation set while relatively maintaining accuracy on standard benchmarks. These findings highlight the potential of adversarially motivated objectives combined with targeted contrastive data to reduce artifact reliance in NLI models.