Skip to content

All models: Fix logic in users_sessions_this_run to account for sparse data #118

@emielver

Description

@emielver

Issue

In users_sessions_this_run we filter start_tstamp from the sessions on lower_limit and upper_limit from user_limits. By joining on start_tstamp we attempt to pull info from the first session per user:

  FROM {{.output_schema}}.sessions{{.entropy}} a
  INNER JOIN {{.scratch_schema}}.users_userids_this_run{{.entropy}} b
  ON a.domain_userid = b.domain_userid

  WHERE a.start_tstamp >= (SELECT lower_limit FROM {{.scratch_schema}}.users_limits{{.entropy}})
  AND   a.start_tstamp <= (SELECT upper_limit FROM {{.scratch_schema}}.users_limits{{.entropy}})

The lower and upper limits are created by taking the MIN and MAX of start_tstamp from users_userids_this_run, where the start_tstamp is the first timestamp for each user based on all of their sessions (taken from {mobile_}sessions --> {mobile_}sessions_userid_manifest_staged -> users_userids_this_run).

In cases when we have very few users, the upper_limit can have a lower value than the start_tstamp of sessions that we need to process. As a result, the users_sessions_this_run table is artificially truncated, which leads to incorrect information being calculated downstream in users_aggs, users_lasts and users.

Proposed Fix 1 (breaking changes)

  • In {mobile_}sessions_userid_manifest_this_run add a new end_tstamp TIMESTAMP column and fill it as MAX(start_tstamp) AS max_tstamp
  • In {mobile_}users_manifest table, add a new end_tstamp TIMESTAMP column and fill it as {{start_date}}::TIMESTAMP
  • In {mobile_}users_limits change the definition of upper_limit to MAX(end_tstamp) AS upper_limit

Proposed Fix 2 (no breaking changes)

  • In users_sessions_this_run remove start_tstamp filter related to the upper_limit:
    AND a.start_tstamp <= (SELECT upper_limit FROM {{.scratch_schema}}.{mobile_}users_limits{{.entropy}})

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions