主 办:北 京 中 医 药 大 学
ISSN 1006-2157 CN 11-3574/R

北京中医药大学学报 ›› 2021, Vol. 44 ›› Issue (6): 538-543.doi: 10.3969/j.issn.1006-2157.2021.06.008

• 中医信息 • 上一篇    下一篇

基于随机森林算法的中医寒证和热证诊断模型研究*

舒琛洁, 梁浩, 王耘#   

  1. 北京中医药大学中药学院 北京 102488
  • 收稿日期:2020-11-21 出版日期:2021-06-30 发布日期:2021-06-25
  • 通讯作者: #王耘,男,博士,教授,博士生导师,主要研究方向:中药信息融合,E-mail:wangyun@bucm.edu.cn
  • 作者简介:舒琛洁,女,在读博士生
  • 基金资助:
    *国家自然科学基金面上项目(No.81973495)

A model for diagnosing TCM cold and heat patterns based on random forest algorithm*

Shu Chenjie, Liang Hao, Wang Yun#   

  1. School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing 102488, China
  • Received:2020-11-21 Online:2021-06-30 Published:2021-06-25
  • Contact: Prof. Wang Yun, Ph.D., Doctoral Supervisor. School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing 102488. E-mail: wangyun@bucm.edu.cn
  • Supported by:
    National Natural Science Foundation of China (No.81973495)

摘要: 目的 从症状体征的角度,构建中医寒证和热证的诊断模型,为寒热辨证标准化提供依据。方法 从《证候规范与辨证方法体系的研究》构建的证候要素-症状数据表中分别筛选与“寒”“热”有关的症状,基于随机森林算法特征筛选出排序前15的症状,随机划分为10份,按照7∶3作为训练集和测试集,重新采样后以最佳参数分别构建寒证和热证的随机森林模型,以受试者工作特征(ROC)曲线下面积(AUC)、敏感度和特异度作为模型评价指标。结果 寒证的关键特征变量包括脉浮紧、恶寒、无汗、苔白、得温痛减、冷痛、舌淡、恶寒发热、口不渴、身痛、头痛、苔腻、食欲不振、便溏、肢冷,诊断模型AUC值为0.912,特异度和敏感度分别为0.89和0.80。热证的关键特征变量包括苔黄、口渴、脉滑数、发热、壮热、脉数、小便赤、舌红、脉弦数、口苦、苔腻、舌红绛、尿黄、心烦、头痛,诊断模型AUC值为0.891,特异度和敏感度分别为0.85和0.86。结论 基于变量筛选及随机森林算法,有效建立了寒热的辨证模型,显示出较好的分类效果,可以为标准化辨证提供方法学参考。

关键词: 随机森林算法, 诊断模型, 证候要素

Abstract: Objective To construct a model for diagnosing cold and heat patterns from the perspective of symptoms to provide basis for standardizing cold-heat pattern identification. Methods Symptoms related to the “cold” and “heat” patterns were selected from a constructed pattern elements-symptom data table from “Study on Pattern Standardization and Pattern Identification System”. The top 15 symptoms were selected through feature screening of random forest algorithm. The dataset was split randomly into the training set and the test set with a ratio of 7∶3. After the data were resampled, random forest models for the cold and the heat patterns were constructed with the best parameters. The models were then evaluated with parameters including area under the ROC curve (AUC), sensitivity and specificity. Results The key characteristic variables of cold patterns include tight floating pulse, aversion to cold, absence of sweating, white tongue coating, pain relieved with warmth, cold pain, pale tongue, aversion to cold with fever, absence of thirst, body pain, headache, greasy coating, poor appetite, loose stool, and cold limbs. The model has an AUC of 0.912, a specificity of 0.89, and a sensitivity of 0.80. The key characteristic variables of heat patterns include yellow coating, thirst, slippery rapid pulse, fever, high fever, rapid pulse, dark urine, red tongue, wiry rapid pulse, bitter taste in the mouth, greasy coating, crimson tongue, brown urine, vexation, and headache. The model has an AUC of 0.891, a specificity of 0.85 and a sensitivity of 0.86. Conclusion Based on variable screening and random forest algorithm, models for identification of cold and heat patterns could be established with satisfactory classification effect, which could serve as an indirect means of standardizing cold and heat pattern identification.

Key words: random forest algorithm, model for diagnosis, pattern elements

中图分类号: 

  • R241.3