2021 Abstract

Title1-8. LIAN Zhouhui ; 한자 폰트 생성 : 디지털화에서 지능화로 Chinese Font Synthesis: From Digitalization to Intelligentization(中文字体生成:从数字化到智能化)2021-10-04 10:55
Writer Level 10

한자 폰트 생성 : 디지털화에서 지능화로 

Chinese Font Synthesis: From Digitalization to Intelligentization

中文字体生成:从数字化到智能化) 


  • LIAN Zhouhui(连宙辉, Wangxuan Institute of Computer Technology, Peking University, China) 


The digitization of Chinese ancient books, after more than 40 years of development, now has a content form that combines image data, full-text data and graphic data, database version, CD version, online version of the carrier form, retrieval tools, knowledge tools, research tools System functions. The digitization of Chinese ancient books has given new vitality to Chinese ancient books.Summarizing the current digital achievements of ancient Chinese books, it can be divided into three categories.


1. Shared Chinese Ancient books and Digitization. 

That is, the right to use the book materials is jointly owned by everyone else. It can be divided into: (1) Domestic: 1. Large database. Such as China's Basic Ancient Books Library, Masters of Chinese Studies, Ancient Books Library 2. Digital Classics. For example, the digital "Shuo Wen Jie Zi" 3. Series database. For example, the four series of periodicals, the full-text retrieval and reading system of the 25th history, and the Yongle Dadian database. (2) Overseas: Such as the International Joint Bibliographic System of Chinese Ancient Books and Rare Books jointly constructed by Princeton University and the National Library of China. 

2. Commercialization of Chinese Ancient Books Digitization 

(1) Domestic: Handa Library, Shutongwen Company Ancient Books Database, Airusheng Company Ancient Books Database, Chinese Classics and Ancient Books Database, General Catalogue of Chinese Classics, Jiangsu Library, 800,000 Volume Building, Chinese Digital Library, etc. (2) Overseas: Full-text database of Chinese and Japanese ancient books of carved dragons, etc.

3. Digitization of Chinese ancient books combining sharing and commercialization 

Some of this kind of digitized ancient books are open to the public, and some are paid resources. For example, Chinese rare books and ancient books database, Chinese electronic documents, etc. 

The digitization of Chinese ancient books has developed rapidly, but there are still shortcomings: for example, 1. Incomplete collection. Missing eyebrow batches, tables, variants, or uncommon characters. 2. The photocopy is unclear. 3. The reading method is not perfect. For example, it is not allowed to read recorded texts and images together. 4. The digital format is single. Such as: mostly in html format. 5. Lack of auxiliary functions, such as bookmarks, favorites, downloads, comments, and printing. 6. Limit the number of downloaded words. 7. Lack of punctuation. In the future, the digital development of Chinese ancient books should focus on improving the GBK character set of the Chinese character internal code expansion specification to meet the needs of ancient books digitization as much as possible. Build a professional team to cultivate the ancient cultural literacy, book management and computer technology of developers. Expand the collation of ancient books and supplement the digital proofreading version on the basis of the original. Increase financial support to realize the digitization of shared Chinese ancient books as much as possible. Try to join AI technology, etc. 

Sorting out the status quo of the digitization of Chinese ancient books, looking forward to the future of digital humanities, providing scholars with appropriate selection of digital resources, and providing developers with suggestions for further improvement and development.


汉文古籍数字化,历经四十多年的发展,目前已具备图像数据、全文数据和图文数据相结合的内容形式,数据库版、光盘版、网络版的载体形式,检索工具、知识工具、研究工具的系统功能。汇总当下汉文古籍数字化成果,可分为三类。

一、共享化汉文古籍数字化 

即将书籍资料的使用权与其他所有人共同拥有。

(一)国内:1.大型数据库。如,中国基本古籍库、国学大师、古籍馆。2.数字古典。如,数字化《说文解字》。3.系列数据库。如,四部丛刊、二十五史全文检索阅读系统、永乐大典资料库。

(二)国外:如,美国普林斯顿大学与中国国家图书馆联合建设的中华古籍善本国际联合书目系统等。

二、商业化汉文古籍数字化 

(一)国内:汉达文库、书同文公司古籍数据库、爱如生公司古籍数据库、中华经典古籍库、中国历代典籍总目、江苏文库、八十万卷楼、汉籍数字图书馆等。

(二)国外:雕龙中日古籍全文资料库等。

三、共享化与商业化相结合的汉文古籍数字化 

这类数字化古籍文献资料部分对外开放,部分为付费资源。如,中华善本古籍数据库、汉籍电子文献等。

汉文古籍数字化的数字化程度、数字化效果都有很大提升但仍存在不足:1.收录不全。如,缺少眉批、表格、异体字或生僻字,有缺字现象。2.影印不清。3.阅读方式不完善。如,不可连读录文、连读图像。4.数字化格式单一。如:多数为html格式。5.缺少辅助功能。如,书签、收藏、下载、批注、打印。6.限制下载字数。7.点校待商榷,缺少点校。未来汉文古籍数字化开发应着重完善汉字内码扩展规范GBK字集,尽可能满足古籍数字化需要。打造专业团队,培养开发者的古文化素养、图书管理与计算机技术。拓展古籍整理工作,在原版基础上补充数字化点校版本。加大资金支持,尽可能实现共享化汉文古籍数字化。尝试加入AI技术等。

整理汉文古籍数字化现状、展望未来数字人文,为学者提供合适的数字化资源选择、为开发者提供进一步完善发展的建议。