Skip to content

Overlapping samples when following next link in API #1260

@Mr0grog

Description

@Mr0grog

While cleaning up some React stuff in the UI project today (edgi-govdata-archiving/web-monitoring-ui#1082), I noticed that we sometimes wind up getting overlapping from the /api/pages/{id}/sampled route. Following the next link in the API response leads to getting an end date for the next request that overlaps with the start date for the current request.

This is pretty easy to see as an error when running the React app in dev mode, since it will log errors if/when we try to put duplicate entries in the version select dropdown. For example, I’m currently seeing it on this URL, although I imagine would be likely to happen on any page that we are frequently capturing: http://localhost:3001/page/a21ce536-44d8-4529-b904-2643c98ed62a/71f8caac-e3b0-4e63-8daf-d091b4efeecc..99bcfe44-219a-4a95-a9fd-0118e59d526a

In this case, getting:

https://api.monitoring.envirodatagov.org/api/v0/pages/a21ce536-44d8-4529-b904-2643c98ed62a/versions/sampled?capture_time=2023-01-15..2023-07-17

Responds with a links.next property of:

https://api.monitoring.envirodatagov.org/api/v0/pages/a21ce536-44d8-4529-b904-2643c98ed62a/versions/sampled?capture_time=2022-07-16..2023-01-15

You can see that both of these cover 2023-01-15, and since there is a capture on that date (8a2b6900-aede-4382-a153-add8532412c1), it shows up in both responses and therefore in our sample set twice.

Skimming the controller code, I suspect the problem is that parse_sample_range adds 1 day to the range (I think to make it inclusive of the entire day):

time_range[1] = time_range[1] + 1.day

…but the sampled route also adds 1 day to the range:

if next_version
next_to = next_version.capture_time.to_date + 1.day
links[:next] = api_v0_page_versions_sampled_url(
page,
params: { capture_time: "#{(next_to - SAMPLE_DAYS_DEFAULT.days).iso8601}..#{next_to.iso8601}" }
)
end

…so they are probably both trying to account for the same inclusiveness problem. When they both happen, it leads to an overlap.

We need to also make sure the logic is happening correctly for links.prev!

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Prioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions