feat: add sortable keys for record linkage #654

adamdecaf · 2025-07-02T15:28:31Z

The idea is to generate a list of sortable keys (buckets the fields hash into) so that we can find records which are similar. You can do a multi-compare against these and grab rows which are greater/less than the keys to shrink the amount of detailed similarity scoring calls to make.

"TYPE:0230"
"NAME:0190"

// Country | Type | Identifier
"GOVID:C0173|T0190|X0146"

// Country | State | PostalCode | City | Line1 | Line2 [optional]
"ADDR:C0143|S0021|P0007|Y0023|L0201,0028,0173"

You could then compute some traditional string distance metrics over these sortable keys to rank what's most similar. The keys move from general data to more specific.

With broad fields on the left this allows for prefix filtering in SQL. You could strip out Line1/Line2 data and filter down to a city level. Or find the rows nearby to an exact address by grabbing those greater and less than the target.

feat: add sortable keys for record linkage

d1a2f71

adamdecaf force-pushed the feat-add-record-linkage branch from c342518 to d1a2f71 Compare November 11, 2025 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add sortable keys for record linkage #654

feat: add sortable keys for record linkage #654

Uh oh!

adamdecaf commented Jul 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add sortable keys for record linkage #654

Are you sure you want to change the base?

feat: add sortable keys for record linkage #654

Uh oh!

Conversation

adamdecaf commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adamdecaf commented Jul 2, 2025 •

edited

Loading