主 办:北 京 中 医 药 大 学
ISSN 1006-2157 CN 11-3574/R

JOURNAL OF BEIJIGN UNIVERSITY OF TRADITIONAL CHINE ›› 2015, Vol. 38 ›› Issue (9): 587-590.doi: 10.3969/j.issn.1006-2157.2015.09.006

Previous Articles     Next Articles

Automatic identification of TCM terminology in Shanghan Lun based on conditional random field*

MENG Hongyu1 ,XIE Qingyu2 ,CHANG Hong3 ,MENG Qinggang1#   

  1. 1 School of Preclinical Medicine, Beijing University of Chinese Medicine,Beijing 100029; 2 Institute of Basic Clinical Medicine of China Academy of Chinese Medical Sciences; 3 Baotou Medical College, Inner Mongolia
  • Received:2015-04-07 Online:2015-09-30 Published:2015-09-30

Abstract: Objective To explore the methods of automatic identification of TCM terminology and to expand the forms of natural language processing in TCM documents. Methods Based on the methods of conditional random field(CRF), annotation and automatic identification on terms of symptoms, diseases, pulse-types and prescriptions recorded in Shanghan Lun as the research subjects, the effects of different combinations of the features, such as Chinese character itself, part of speech, word boundary and term category label, on identification of terminology were analyzed and the most effective combination was selected. Results The TCM terminology automatic identification model, combining with the features of Chinese character itself, part of speech, word boundary and term category label, had the precision of 85.00%, recall of 68.00% and F score of 75.56%. Conclusion The multi-features model of combination of Chinese character itself, part of speech, word boundary and the term category label achieved the best identifying result in all combinations.

Key words: TCM terminology, conditional random fields, ShangHan Lun, automatic identification

CLC Number: 

  • R222.19