-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
bugSomething isn't workingSomething isn't working
Description
An edge case has come up where two things conspired to produce duplicates in the model:
- Some users' data have different session_ids at the same time (we think because of a race condition between tabs)
- Latency between device and collector results in exactly the same derived_tstamp for these events
These two factors mean that a user has two sessions with exactly the same start_tstamp, but different domain_sessionid's - and these happen to be the first tstamps for the user.
This produces duplicates in the users table when we join on start_tstamp:
data-models/web/v1/redshift/sql-runner/sql/standard/04-users/01-main/06-users.sql
Line 86 in a38e76b
| AND a.start_tstamp = b.start_tstamp |
The same issue may exist when we join on end_tstamp for aggregates.
This seems very rare, but we should introduce some means of breaking a tie in the case where derived_tstamps happen to evaluate to exactly the same thing.
event 1:
"collector_tstamp": "2021-06-25 16:51:55.559 UTC",
"dvce_sent_tstamp": "2021-06-25 16:51:54.919 UTC",
"dvce_created_tstamp": "2021-06-25 16:51:54.911 UTC",
"derived_tstamp": "2021-06-25 16:51:55.551 UTC",
event 2:
"collector_tstamp": "2021-06-25 16:51:55.557 UTC",
"dvce_sent_tstamp": "2021-06-25 16:51:55.077 UTC",
"dvce_created_tstamp": "2021-06-25 16:51:55.071 UTC",
"derived_tstamp": "2021-06-25 16:51:55.551 UTC",
(first reported on ZD ticket 27522)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working