Towards the Integration of Natural Language and Eye Tracking Information for Predicting Comma Placement in Chinese Sentence
This paper investigates a relatively underdeveloped but important subject in NLP – prediction of punctuation marks. We implemented a CRF model incorporating linguistic features for predicting commas, punctuations which affect readability most, in Chinese sentences. Evaluating on Penn Chinese Treebank data, the CRF model achieved precision of 80% and recall of 61%. Furthermore, a potential of eye tracking information for this task is also discussed. By integrating eye tracking information and NLP, a better comma predictor for readability improvement is expected to be created.