Skip to content

Break Ties when joining on timestamps #106

@colmsnowplow

Description

@colmsnowplow

An edge case has come up where two things conspired to produce duplicates in the model:

  1. Some users' data have different session_ids at the same time (we think because of a race condition between tabs)
  2. Latency between device and collector results in exactly the same derived_tstamp for these events

These two factors mean that a user has two sessions with exactly the same start_tstamp, but different domain_sessionid's - and these happen to be the first tstamps for the user.

This produces duplicates in the users table when we join on start_tstamp:

The same issue may exist when we join on end_tstamp for aggregates.

This seems very rare, but we should introduce some means of breaking a tie in the case where derived_tstamps happen to evaluate to exactly the same thing.

event 1:
    "collector_tstamp": "2021-06-25 16:51:55.559 UTC",
    "dvce_sent_tstamp": "2021-06-25 16:51:54.919 UTC",
    "dvce_created_tstamp": "2021-06-25 16:51:54.911 UTC",
    "derived_tstamp": "2021-06-25 16:51:55.551 UTC",

event 2:
    "collector_tstamp": "2021-06-25 16:51:55.557 UTC",
    "dvce_sent_tstamp": "2021-06-25 16:51:55.077 UTC",
    "dvce_created_tstamp": "2021-06-25 16:51:55.071 UTC",
    "derived_tstamp": "2021-06-25 16:51:55.551 UTC",

(first reported on ZD ticket 27522)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions