diff --git a/data/xml/2024.caldpseudo.xml b/data/xml/2024.caldpseudo.xml index 111ce2f1e0..4d25d0815c 100644 --- a/data/xml/2024.caldpseudo.xml +++ b/data/xml/2024.caldpseudo.xml @@ -25,7 +25,7 @@ Handling Name Errors of a <fixed-case>BERT</fixed-case>-Based De-Identification System: Insights from Stratified Sampling and <fixed-case>M</fixed-case>arkov-based Pseudonymization DaltonSimancekDepartment of Learning Health Sciences, University of Michigan - VG VinodVydiswaranSchool of Information, University of Michigan + V.G.VinodVydiswaranSchool of Information, University of Michigan 1-7 Missed recognition of named entities while de-identifying clinical narratives poses a critical challenge in protecting patient-sensitive health information. Mitigating name recognition errors is essential to minimize risk of patient re-identification. In this paper, we emphasize the need for stratified sampling and enhanced contextual considerations concerning Name Tokens using a fine-tuned Longformer BERT model for clinical text de-identifcation. We introduce a Hidden in Plain Sight (HIPS) Markov-based replacement technique for names to mask name recognition misses, revealing a significant reduction in name leakage rates. Our experimental results underscore the impact on addressing name recognition challenges in BERT-based de-identification systems for heightened privacy protection in electronic health records. 2024.caldpseudo-1.1 diff --git a/data/yaml/name_variants.yaml b/data/yaml/name_variants.yaml index 55b9332e91..8f01ec2cea 100644 --- a/data/yaml/name_variants.yaml +++ b/data/yaml/name_variants.yaml @@ -1545,6 +1545,9 @@ - canonical: {first: Jeong-Won, last: Cha} variants: - {first: Jeongwon, last: Cha} +- canonical: {first: V.G.Vinod, last: Vydiswaran} + variants: + - {first: VG Vinod, last: Vydiswaran} - canonical: {first: Seungho, last: Cha} id: seungho-cha - canonical: {first: Joyce, last: Chai}