-
Notifications
You must be signed in to change notification settings - Fork 1
Set fromtime, modify how we set totime for deactivated (obs_pgm reliance) #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
38f1102 to
f48116d
Compare
77a59ad to
29917db
Compare
0034303 to
d3e7be4
Compare
ingestion/src/util/stinfosys.rs
Outdated
| let _totime = station_totime.get(&label.key.station_id).copied(); | ||
| // NOTE: we are choosing to essentially close off this timeseries, since we believe | ||
| // it is mislabelled. Obs_pgm is essentially saying it should not exist. | ||
| Ok((label.id, fromtime, fromtime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmlefeuvre-met @intarga
Think this will do what we discussed yesterday, essentially closing off the timeseries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we comment let _totime = station_totime.get(&label.key.station_id).copied(); as it is eventually not used, it confused me until I saw Ok((label.id, fromtime, fromtime))
Maybe just some semantics, but should the |--station--| be replaced by |--timeseries--| because for me station is another level of information. You can get a fromtime and totime for a station that indicates when a station was opened and closed (from the station table in stinfosys), whereas here we mean a timeseries that is defined by a primarykey/label.met and is defined by stationid, typeid, paramid, sensor, level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we comment
let _totime = station_totime.get(&label.key.station_id).copied();as it is eventually not used, it confused me until I sawOk((label.id, fromtime, fromtime))Maybe just some semantics, but should the
|--station--|be replaced by|--timeseries--|because for me station is another level of information. You can get a fromtime and totime for a station that indicates when a station was opened and closed, whereas here we mean a timeseries that is defined by a primarykey/label.met and is defined by stationid, typeid, paramid, sensor, level?
-> Relevant for the entire file, is it me who mix things up or station had different definitions in this file @lcoram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it only handles the case where obs_pgm entirely before station, but not the other way around?
In any case the nest of if conditions is confusing, could maybe be written much more straightforwardly using Timerange::overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so after discussion with @pmlefeuvre-met I managed to understand that I had been confused about what station_from/totime were, and how they should be used. Now have modified the sql for getting the list of labels so we also actually get the from/to from the DB. This is then used for the first check (to see if its outside the range completely), and after that it defaults to using either obs_pgm if it exists, or station.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have refactored, hope it is easier to understand and now actually correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use Timerange::overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look at it, sorry... Was just trying to make it make sense first. Get the logic right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you understood what I meant. Should it not be possible to get rid of the whole if chain using Timerange::overlap? Since the from/totime you're trying to get out at the end is just the overlap of the ones you're feeding in? Then you just have to handle the None case, unless I missed something?
c580a57 to
65e94c2
Compare
…unt along with obspgm and station times
65e94c2 to
50506d3
Compare
e90c05a to
2483f41
Compare
|
What is KLOBS? Kvalobs data? Empty data?
…On Mon, 24 Nov 2025, 10:18 Louise Oram, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In integration_tests/tests/end_to_end.rs
<#182 (comment)>:
> Some(totime),
- // timeseries on station 20001 is not
+ // no max/min obstime found for KLOBS data?
Any ideas why the max/min obstime isn't working for KLOBS?
—
Reply to this email directly, view it on GitHub
<#182 (review)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/BDCA5ZGZARSTZLX3NMEB2WL36LENPAVCNFSM6AAAAACLIE6QNCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIOJZGE4DQMZUGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…for finding max/min obstime
ingestion/src/util/stinfosys.rs
Outdated
| // TODO: set to time to closed based on timeresolution of the timeseries | ||
| // for now have to leave it open unless very long time ago??? | ||
| let ten_years_duration = Duration::days(365 * 10); | ||
| let ten_years_ago = Utc::now() - ten_years_duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmlefeuvre-met @jo-asplin-met-no
It seems that for frost it would be better if we could cut these closer (I mean even 10 years is an improvement over just leaving them open if obs_pgm hasn't closed them)... Right now frost thinks a lot of time series should have data that do not. We talked about cutting these based on time resolution, which we don't technically have yet... :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to solve this problem. If a timeseries is left open that should be closed, that should be fixed in obs_pgm / stinfosys / eventually content managers. If you want to poke whoever is responsible for closing things in there, you can do a query to make a list for them of things they should probably address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could do with some documentation related to this. Some questions that would be good to cover:
- What is obs_pgm, and why is it named that?
- Why are we using obs_pgm?
- Who is responsible for what's in obs_pgm?
| chrono.workspace = true | ||
| chronoutil.workspace = true | ||
| util = { path = "../util" } | ||
| lard_egress = { path = "../egress" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this about? Just to get access the the Timerange type? It should be moved to util, to avoid creating a dependency between ingestion and egress
ingestion/src/util/stinfosys.rs
Outdated
| // TODO: set to time to closed based on timeresolution of the timeseries | ||
| // for now have to leave it open unless very long time ago??? | ||
| let ten_years_duration = Duration::days(365 * 10); | ||
| let ten_years_ago = Utc::now() - ten_years_duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to solve this problem. If a timeseries is left open that should be closed, that should be fixed in obs_pgm / stinfosys / eventually content managers. If you want to poke whoever is responsible for closing things in there, you can do a query to make a list for them of things they should probably address.
ingestion/src/util/stinfosys.rs
Outdated
| pub async fn fetch_deactivated( | ||
| obs_pgm_totime: &HashMap<MetTimeseriesKey, DateTime<Utc>>, | ||
| station_totime: &HashMap<i32, DateTime<Utc>>, | ||
| pub async fn fetch_from_to_for_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion to replace this function:
pub fn calc_from_tos(
obs_pgm_ranges: &HashMap<MetTimeseriesKey, OpenTimerange>,
station_ranges: &HashMap<i32, OpenTimerange>,
data_ranges: HashMap<i64, OpenTimerange>,
labels: Vec<MetLabel>,
) -> Vec<(i64, OpenTimerange)> {
labels.iter().filter_map(|label| {
// default is open on both sides
let obs_pgm = obs_pgm_ranges.get(&label.key).unwrap_or_default();
let station = station_ranges.get(&label.key.station_id).unwrap_or_default();
// we `?` this one because if it's None, the ts doesn't exist and we can't update it
let data = data_ranges.get(&label.id)?;
let overlap = obs_pgm.overlap(station).map(|x| x.overlap(data)).flatten();
// if neither of these have a to_time, we should'nt close the ts because it might still
// receive new data
let should_be_closed = obs_pgm.totime.is_some() || station.totime.is_some();
let out = match (overlap, should_be_closed) {
(Some(overlap), true) => overlap,
(Some(overlap), false) => OpenTimerange { from: overlap.from, to: None },
(None, true) => OpenTimerange {from: data.to, to: data.to},
(None, false) => OpenTimerange {from: obs_pgm.from.max(station.from).max(data.from), to: None},
};
Some((label.id, out))
}).collect()
}Note that we don't need this to be async or return a result. I think those were originally there because Manuel was querying stinfosys from within this function
The match cases with overlap == None are the "twisted" cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment on should_be_closed is maybe a bit confusing because it's the negative (shouldn't) while the name is positive (shouldn't). Perhaps you can come up with a better comment? Or we can just invert the boolean to let shouldnt_be_closed = obs_pgm.totime.is_none() && station.totime.is_none() and switch the trues and falses in the match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, the only thing I'm not sure about is that originaly PiM told me to code it so that obs_pgm ranges have precedence over station if the timeseries/station exists in both. Something like
// Prefer obs_pgm if available
let stinfo_range = obs_pgm_ranges
.get(&label.key)
.or(station_ranges.get(&label.key.station_id))
.unwrap_or_default();But yeah, it's probably worth it to add a comment explaining why, if we want to keep it that way.
| from: Some(fromtime.and_utc()), | ||
| to: Some(totime.and_utc()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these should be Some? That would mean from/totimes in stinfosys are never NULL. If that's true we can probably use ClosedTimeranges since the ts_fromto is also always closed, and simplify the previous function even more.
Same applies to the station from/to below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because the original code only acted on open timeseries in LARD that were supposed to be closed, so in the queries to stinfosys we only select the timeseries that have non-NULL totime
|
|
||
| pub struct DeactivatedTimeseries { | ||
| #[derive(Debug)] | ||
| pub struct TSupdateTimeseries { | ||
| /// Timeseries to be updated | ||
| pub tsid: i64, | ||
| /// Fromtime value found in the metadata source | ||
| pub fromtime: DateTime<Utc>, | ||
| /// Totime value found in the metadata source | ||
| pub totime: DateTime<Utc>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested by my function suggestion above, I don't think we need this type, (i64, OpenTimerange) is fine IMO
| Ok(ts_from_to) | ||
| } | ||
|
|
||
| pub async fn set_from_to_obs_pgm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to update_from_to since it's not just based on obs_pgm?
Dealt with by choosing the most restrictive if the timeseries has overlapp with obs_pgm. However there is much discussion about if this is the right way to handle this...?
Or do we want to do this manually since once they are "fixed" then we don't need this code?
Or remove this completely from the update query?
Also maybe need to add query that reactivates stuff that was deactivated, or maybe we have to do that manually...?
I have renamed some of the functions, since we are not really deactivating anything...
TODO: do we need to manually set everything that was deactivated=true to false?