Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion data/xml/2024.caldpseudo.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
<paper id="1">
<title>Handling Name Errors of a <fixed-case>BERT</fixed-case>-Based De-Identification System: Insights from Stratified Sampling and <fixed-case>M</fixed-case>arkov-based Pseudonymization</title>
<author><first>Dalton</first><last>Simancek</last><affiliation>Department of Learning Health Sciences, University of Michigan</affiliation></author>
<author><first>VG Vinod</first><last>Vydiswaran</last><affiliation>School of Information, University of Michigan</affiliation></author>
<author><first>V.G.Vinod</first><last>Vydiswaran</last><affiliation>School of Information, University of Michigan</affiliation></author>
<pages>1-7</pages>
<abstract>Missed recognition of named entities while de-identifying clinical narratives poses a critical challenge in protecting patient-sensitive health information. Mitigating name recognition errors is essential to minimize risk of patient re-identification. In this paper, we emphasize the need for stratified sampling and enhanced contextual considerations concerning Name Tokens using a fine-tuned Longformer BERT model for clinical text de-identifcation. We introduce a Hidden in Plain Sight (HIPS) Markov-based replacement technique for names to mask name recognition misses, revealing a significant reduction in name leakage rates. Our experimental results underscore the impact on addressing name recognition challenges in BERT-based de-identification systems for heightened privacy protection in electronic health records.</abstract>
<url hash="ed1cd679">2024.caldpseudo-1.1</url>
Expand Down
3 changes: 3 additions & 0 deletions data/yaml/name_variants.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1545,6 +1545,9 @@
- canonical: {first: Jeong-Won, last: Cha}
variants:
- {first: Jeongwon, last: Cha}
- canonical: {first: V.G.Vinod, last: Vydiswaran}
Copy link
Contributor

@weissenh weissenh Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue submitter has provided an orcid. Why not add orcidand id field here?
Future papers can be mapped by orcid, issue submitter wasn't clear about it but I assume all papers currently listed on the page belong to the person, so we could verify the paper and person.

edit: and institution (or degree field) and comment field

variants:
- {first: VG Vinod, last: Vydiswaran}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe there are more papers where metadata and PDF content might not fully match. Some of this might be irrelevant for getting the paper on the same author page (so not sure which ones we need to list in the yaml explicitly, cf. https://github.com/acl-org/acl-anthology/wiki/Name-Variants ) , but it is nonetheless a slight inconsistency.

On the page with metadata displayed as "V.G.Vinod Vydiswaran" (without space in first name) I find papers with the following name copied from the PDF:

  • 2024.findings-emnlp.145 VG Vinod Vydiswaran (no punctuation, space added)
  • 2024.naacl-long.165 V.G. Vinod Vydiswaran (space added)
  • 2023.findings-acl.561 V.G. Vinod Vydiswaran (space added)
  • I17-1040 V. G. Vinod Vydiswaran (two! spaces added)
  • N10-4008 V.G. Vinod Vydiswaran (space added)

- canonical: {first: Seungho, last: Cha}
id: seungho-cha
- canonical: {first: Joyce, last: Chai}
Expand Down