2021 Abstract

Title1-9. HSU ChihFan ; 반자동 한문 고전 자료 분절화의 발전과 시도 Development and Evaluation of a Semi-Automatic Ancient Chinese Sentence Segmentation System2021-10-04 11:00
Writer Level 10

반자동 한문 고전 자료 분절 시스템의 발전과 평가

Development and Evaluation of a Semi-Automatic Ancient Chinese Sentence Segmentation System 

半自動古漢語斷句系統發展與評估 


  • HSU ChihFan  (徐志帆, Research Center for Chinese Cultural Subjectivity, National Chengchi University, Taiwan)


To reduce the manual labor and effort on determining the punctuation of ancient Chinese sentences, this study develops a semi-automatic ancient Chinese sentence segmentation system (SAACSSS) based on active learning mode that can assist digital humanists to efficiently determine where punctuation should be added in ancient Chinese sentences without punctuation through a human-computer interaction way. This study focuses on two main parts, including selecting a suitable machine learning algorithm for developing SAACSSS and evaluating the accuracy of SAACSSS in assisting digital humanists to determine the punctuation of ancient Chinese sentences. In the first part, this study tried to develop active learning mode with human-computer collaboration based on several well-known machine learning algorithms, including Logistic Regression, Naive Bayes Classifier, Long Short-Term Memory (LSTM), Maximum Entropy, and Conditional Random Fields (CRF) with n-gram feature templates. In the second part, this study designed an experiment to assess the accuracy of the developed SAACSSS on determining the punctuation of ancient Chinese sentences based on the ancient books of Ming Dynasty. Analytical results show that developing an active learning mode for SAACSSS based on CRF with trigram feature template could get the highest accuracy in predicting the punctuation of ancient Chinese sentences in comparison with the other considered machine learning algorithms. Moreover, this study also invited six digital humanists to perform ancient Chinese sentence segmentation task with the SACCSSS support. The result revealed that SACCSSS could not only support digital humanists to correctly determine the punctuation of ancient Chinese sentences to some degree, but also could continuously promote the accuracy of predicting punctuation through a human-computer interaction way. According to the semi-structure interview, most of the interviewees expressed positive satisfactory towards the system manipulation process and the system interface of SACCSSS. Finally, this study suggests that SACCSSS can be improved its accuracy in predicting the punctuation of ancient Chinese sentences by integrating Named Entity Recognition (NER) or Part-of-speech (POS) tagging technologies in the future study. Moreover, this study also suggests that applying SACCSSS to digital humanities education for assisting students to practice or learn how to correctly determine the punctuation of ancient Chinese sentences should be considered in the future study.