Tech Log R Data Process
2026年3月26日 研究日志
今天研究了一天数据集,和同事们交流后终于定下研究对象了↓
# RE####
disease_patterns <- list(
"1高脂血症" = "\\bE78(\\b|\\.)",
"2终末期肾衰竭ESRD" = "\\bN(?:18\\.56|19)\\b",
"3Dialysis透析" = "\\bZ49(\\.(0|1|2|3))?\\b",
"4TIA" = "\\bG45(\\b|\\.)",
"5缺血性卒中" = "\\bI(63|64|65)(\\b|\\.)",
"5缺血性卒中ver2" = "\\bI(63|64|65|66|67|68|69)(\\b|\\.)",
"6出血性卒中" = "\\bI(60|61|62)(\\b|\\.)",
"7栓塞" = "\\bI74(\\b|\\.)|\\bK55\\.0\\b|\\bN28\\.0\\b",
"8IHD_ICD" = "\\bI(20|21|22|23|24|25)(\\b|\\.)",
"9PAD_strict" = "\\bI(70|71)|\\bI7[34](\\b|\\.)",
"9PAD_strict_ver2" = "\\bI(70|71|73|74)(\\b|\\.)",
"10Aortic_large-vessel" = "\\bI(70|71|72)(\\b|\\.)",
"11Carotid" = "\\bI65(\\b|\\.)",
"12HF心力衰竭" = "\\bI50(\\b|\\.)",
"12HF心力衰竭ver2" = "\\b(?:I50|I42|I43|J81|R57)(?:\\.[0-9]+)?|I11\\.0|I13\\.0|I13\\.2|I25\\.5|K76\\.1\\b",
"12HF心力衰竭ver3" = "\\b(?:I50|I42|I43|J81|R57)(?:\\.[0-9]+)?|I11\\.0|I11\\.9|I13\\.0|I13\\.2|I25\\.5|K76\\.1\\b",
"13HFrEF" = "\\bI50(\\.1\\b|\\.2\\b)",
"14蛛网膜下腔出血" = "\\bI60(\\b|\\.)",
"14脑内出血" = "\\bI61(\\b|\\.)",
"14其他非创伤性颅内出血" = "\\bI62(\\b|\\.)",
"14吐血" = "\\bK92.0\\b",
"14黑便" = "\\bK92.1\\b",
"14未特指消化道出血" = "\\bK92.2\\b",
"14食管静脉曲张破裂出血" = "\\bI85\\.(?:0|11)\\b",
"14溃疡伴出血" = "\\bI(25|26|27|28)(\\b|\\.)",
"14血尿" = "\\bR31(\\b|\\.)",
"14肾出血" = "\\b(?:N02\\.\\S*|N28\\.0(?!\\w))",
"14鼻出血" = "\\bR04\\.0(?!\\b)",
"14咽部出血" = "\\bR04\\.1(?!\\b)",
"14咯血" = "\\bR04\\.2(?!\\b)",
"14腹腔出血" = "\\bK66\\.1(?!\\b)",
"14脾破裂伴出血" = "\\bS36\\.0\\S*",
"14子宫异常出血" = "\\bN93\\.\\S*",
"14产科及产后出血" = "\\bO(67|72)(\\b|\\.)",
"14抗凝相关出血" = "\\bT45\\.5\\b",
"14抗凝相关凝血异常" = "\\bD68\\.32(?!\\b)",
"14其他出血" = "\\bR58(\\b|\\.)",
"15肝硬化" = "\\bK74(\\b|\\.)",
"15肝衰竭" = "\\bK72(\\b|\\.)",
"16风湿性心脏病二尖瓣狭窄" = "\\bI05(\\b|\\.)",
"17机械瓣膜" = "\\bZ95(\\.(2|3|4))?\\b",
"高血压性心脏病(未提及心力衰竭)" = "\\bI11.9\\b"
)
说真的,正则表达式真是好用啊,当年我用perl做数据处理的时候就在用。说来惭愧,现在Perl的语法基本忘光光了,还好正则表达式的语法记得一些,在R中差别不大真是太好了。作为一篇技术博客,这里稍微讲一点吧:
\b 表示“单词边界”,即字母数字与非字母数字之间的位置。例子:\bI60(\b|\.) 表示匹配 I60 后面要么是边界,要么是点
. 表示字面点。因为.在正则中表示“任意字符”,但很遗憾,疾病编码中处处都是真正意义的点. 不过没有关系,只要使用转义符\ 就可以实现表示真正意义的点了。
( … ) 括号用于把多个字符组合成一个整体。例子:I(60|61|62) 表示匹配 I60、I61、I62。顺带一提,|表示逻辑“或”
[ … ] 匹配方括号内的任意一个字符。 例子:I7[34] 匹配 I73 或 I74。
\S 非空白字符
\w 字母数字或下划线
\d 数字
(?! … ) 负向前瞻,断言后面不能出现某种模式,用于避免意外匹配。 例子:N28.0(?!\w) 表示 N28.0 后面不能跟字母数字。
今天也搞得很累,就写这么多吧,明天得读读文献了。