Skip to content

Conversation

@lcoram
Copy link
Collaborator

@lcoram lcoram commented Nov 5, 2025

  • Fix timeseries fromtime #80
    Dealt with by choosing the most restrictive if the timeseries has overlapp with obs_pgm. However there is much discussion about if this is the right way to handle this...?
  • Deal with the bug where we have from/to times for timeseries that do not overlap at all with the range from obspgm (something I called "twisting") Bug: Totime set to be before fromtime #174
    Or do we want to do this manually since once they are "fixed" then we don't need this code?
  • Actually get the max/min obstime of the underlying data for the timeseries and use that where it makes sense
  • deactivated = false (do not set to true!)
    Or remove this completely from the update query?
    Also maybe need to add query that reactivates stuff that was deactivated, or maybe we have to do that manually...?
    I have renamed some of the functions, since we are not really deactivating anything...

TODO: do we need to manually set everything that was deactivated=true to false?

@lcoram lcoram changed the title add structures to get the obspgm and station fromtimes Set fromtime based on obspgm Nov 5, 2025
@lcoram lcoram force-pushed the obs_pgm_from_times branch from 38f1102 to f48116d Compare November 5, 2025 20:40
@lcoram lcoram changed the title Set fromtime based on obspgm Set fromtime , modify how we set totime for deactivated (obs_pgm reliance) Nov 5, 2025
@lcoram lcoram changed the title Set fromtime , modify how we set totime for deactivated (obs_pgm reliance) Set fromtime, modify how we set totime for deactivated (obs_pgm reliance) Nov 5, 2025
@lcoram lcoram force-pushed the obs_pgm_from_times branch from 77a59ad to 29917db Compare November 6, 2025 12:34
@lcoram lcoram marked this pull request as ready for review November 6, 2025 12:37
@lcoram lcoram requested a review from intarga November 6, 2025 12:37
@lcoram lcoram force-pushed the obs_pgm_from_times branch from 0034303 to d3e7be4 Compare November 10, 2025 08:27
let _totime = station_totime.get(&label.key.station_id).copied();
// NOTE: we are choosing to essentially close off this timeseries, since we believe
// it is mislabelled. Obs_pgm is essentially saying it should not exist.
Ok((label.id, fromtime, fromtime))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmlefeuvre-met @intarga
Think this will do what we discussed yesterday, essentially closing off the timeseries.

Copy link
Collaborator

@pmlefeuvre-met pmlefeuvre-met Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we comment let _totime = station_totime.get(&label.key.station_id).copied(); as it is eventually not used, it confused me until I saw Ok((label.id, fromtime, fromtime))

Maybe just some semantics, but should the |--station--| be replaced by |--timeseries--| because for me station is another level of information. You can get a fromtime and totime for a station that indicates when a station was opened and closed (from the station table in stinfosys), whereas here we mean a timeseries that is defined by a primarykey/label.met and is defined by stationid, typeid, paramid, sensor, level?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we comment let _totime = station_totime.get(&label.key.station_id).copied(); as it is eventually not used, it confused me until I saw Ok((label.id, fromtime, fromtime))

Maybe just some semantics, but should the |--station--| be replaced by |--timeseries--| because for me station is another level of information. You can get a fromtime and totime for a station that indicates when a station was opened and closed, whereas here we mean a timeseries that is defined by a primarykey/label.met and is defined by stationid, typeid, paramid, sensor, level?

-> Relevant for the entire file, is it me who mix things up or station had different definitions in this file @lcoram

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it only handles the case where obs_pgm entirely before station, but not the other way around?

In any case the nest of if conditions is confusing, could maybe be written much more straightforwardly using Timerange::overlap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so after discussion with @pmlefeuvre-met I managed to understand that I had been confused about what station_from/totime were, and how they should be used. Now have modified the sql for getting the list of labels so we also actually get the from/to from the DB. This is then used for the first check (to see if its outside the range completely), and after that it defaults to using either obs_pgm if it exists, or station.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have refactored, hope it is easier to understand and now actually correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use Timerange::overlap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look at it, sorry... Was just trying to make it make sense first. Get the logic right.

Copy link
Member

@intarga intarga Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you understood what I meant. Should it not be possible to get rid of the whole if chain using Timerange::overlap? Since the from/totime you're trying to get out at the end is just the overlap of the ones you're feeding in? Then you just have to handle the None case, unless I missed something?

@lcoram lcoram marked this pull request as draft November 13, 2025 14:01
@lcoram lcoram force-pushed the obs_pgm_from_times branch 2 times, most recently from c580a57 to 65e94c2 Compare November 14, 2025 08:23
@lcoram lcoram marked this pull request as ready for review November 14, 2025 08:24
@lcoram lcoram force-pushed the obs_pgm_from_times branch from 65e94c2 to 50506d3 Compare November 14, 2025 20:17
@lcoram lcoram force-pushed the obs_pgm_from_times branch from e90c05a to 2483f41 Compare November 20, 2025 09:10
@pmlefeuvre-met
Copy link
Collaborator

pmlefeuvre-met commented Nov 24, 2025 via email

// TODO: set to time to closed based on timeresolution of the timeseries
// for now have to leave it open unless very long time ago???
let ten_years_duration = Duration::days(365 * 10);
let ten_years_ago = Utc::now() - ten_years_duration;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmlefeuvre-met @jo-asplin-met-no
It seems that for frost it would be better if we could cut these closer (I mean even 10 years is an improvement over just leaving them open if obs_pgm hasn't closed them)... Right now frost thinks a lot of time series should have data that do not. We talked about cutting these based on time resolution, which we don't technically have yet... :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place to solve this problem. If a timeseries is left open that should be closed, that should be fixed in obs_pgm / stinfosys / eventually content managers. If you want to poke whoever is responsible for closing things in there, you can do a query to make a list for them of things they should probably address.

Copy link
Member

@intarga intarga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do with some documentation related to this. Some questions that would be good to cover:

  • What is obs_pgm, and why is it named that?
  • Why are we using obs_pgm?
  • Who is responsible for what's in obs_pgm?

chrono.workspace = true
chronoutil.workspace = true
util = { path = "../util" }
lard_egress = { path = "../egress" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this about? Just to get access the the Timerange type? It should be moved to util, to avoid creating a dependency between ingestion and egress

// TODO: set to time to closed based on timeresolution of the timeseries
// for now have to leave it open unless very long time ago???
let ten_years_duration = Duration::days(365 * 10);
let ten_years_ago = Utc::now() - ten_years_duration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place to solve this problem. If a timeseries is left open that should be closed, that should be fixed in obs_pgm / stinfosys / eventually content managers. If you want to poke whoever is responsible for closing things in there, you can do a query to make a list for them of things they should probably address.

pub async fn fetch_deactivated(
obs_pgm_totime: &HashMap<MetTimeseriesKey, DateTime<Utc>>,
station_totime: &HashMap<i32, DateTime<Utc>>,
pub async fn fetch_from_to_for_update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion to replace this function:

pub fn calc_from_tos(
    obs_pgm_ranges: &HashMap<MetTimeseriesKey, OpenTimerange>,
    station_ranges: &HashMap<i32, OpenTimerange>,
    data_ranges: HashMap<i64, OpenTimerange>,
    labels: Vec<MetLabel>,
) -> Vec<(i64, OpenTimerange)> {
    labels.iter().filter_map(|label| {
        // default is open on both sides
        let obs_pgm = obs_pgm_ranges.get(&label.key).unwrap_or_default();
        let station = station_ranges.get(&label.key.station_id).unwrap_or_default();
        // we `?` this one because if it's None, the ts doesn't exist and we can't update it
        let data = data_ranges.get(&label.id)?;

        let overlap = obs_pgm.overlap(station).map(|x| x.overlap(data)).flatten();

        // if neither of these have a to_time, we should'nt close the ts because it might still
        // receive new data
        let should_be_closed = obs_pgm.totime.is_some() || station.totime.is_some();

        let out = match (overlap, should_be_closed) {
            (Some(overlap), true) => overlap,
            (Some(overlap), false) => OpenTimerange { from: overlap.from, to: None },
            (None, true) => OpenTimerange {from: data.to, to: data.to},
            (None, false) => OpenTimerange {from: obs_pgm.from.max(station.from).max(data.from), to: None},
        };

        Some((label.id, out))
    }).collect()
}

Note that we don't need this to be async or return a result. I think those were originally there because Manuel was querying stinfosys from within this function

The match cases with overlap == None are the "twisted" cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment on should_be_closed is maybe a bit confusing because it's the negative (shouldn't) while the name is positive (shouldn't). Perhaps you can come up with a better comment? Or we can just invert the boolean to let shouldnt_be_closed = obs_pgm.totime.is_none() && station.totime.is_none() and switch the trues and falses in the match?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, the only thing I'm not sure about is that originaly PiM told me to code it so that obs_pgm ranges have precedence over station if the timeseries/station exists in both. Something like

// Prefer obs_pgm if available
let stinfo_range = obs_pgm_ranges
     .get(&label.key)
     .or(station_ranges.get(&label.key.station_id))
     .unwrap_or_default();

But yeah, it's probably worth it to add a comment explaining why, if we want to keep it that way.

Comment on lines +209 to +210
from: Some(fromtime.and_utc()),
to: Some(totime.and_utc()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these should be Some? That would mean from/totimes in stinfosys are never NULL. If that's true we can probably use ClosedTimeranges since the ts_fromto is also always closed, and simplify the previous function even more.

Same applies to the station from/to below

Copy link
Collaborator

@Lun4m Lun4m Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because the original code only acted on open timeseries in LARD that were supposed to be closed, so in the queries to stinfosys we only select the timeseries that have non-NULL totime

Comment on lines -54 to 63

pub struct DeactivatedTimeseries {
#[derive(Debug)]
pub struct TSupdateTimeseries {
/// Timeseries to be updated
pub tsid: i64,
/// Fromtime value found in the metadata source
pub fromtime: DateTime<Utc>,
/// Totime value found in the metadata source
pub totime: DateTime<Utc>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by my function suggestion above, I don't think we need this type, (i64, OpenTimerange) is fine IMO

Ok(ts_from_to)
}

pub async fn set_from_to_obs_pgm(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to update_from_to since it's not just based on obs_pgm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants