-
Notifications
You must be signed in to change notification settings - Fork 25
Keeping track of fetched packages and commits to avoid duplicates and make fetching remotes faster #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The fact that the dups tests fails is only because of differences in blank lines. It would be nice if we could create a clean version without additional blank lines thatn we don't have to worry about that. |
|
if there are cases where duplicates remain, I prefer to keep the coding as is to be honest |
|
haha i hear you. This issue is quite challenging and can be annoying. |
|
consecutive empty lines are now removed with 4b31d10 |
|
nice. I am experimenting with a solution to keep track of the already fetched packages using environments (Claude Sonnet 3.5 suggestion) btw, but not there yet |
we can thank chatgpt for it 😂 |
|
Yeah, we can also thank Github Copilot (Claude Sonnet 3.5) for this idea with environments. Also the seurat-wrappers example worked fine (no duplicates). So if you have time, have a look, I haven't used this environment approach, it's a little awkward, but also quite cool because don't know how to carry the already fetched packages through the functions. But don't think this is ready to merge, needs some more testing. |
f667e38 to
252d6be
Compare
|
hm, but it changed the commit hash, and now it's not the closest anymore, have to further investigate @b-rodrigues btw the ubuntu-checks are still not showing if they fail. I know have other commit hash, so there are 2 tests failing in devtools_test (ubuntu-latest), but it says it ran succesfully. I think it would be helpful if you could fix that. |
|
just further thinking out loud: So mlre3xtralearners with the given commit hash Not sure there is a right decision here. We are probably again at the point where the user has to decide, right @b-rodrigues. But this is the reason why the tests fail and why on the other hand building the shell now works. I think it would more sense to skip fetching the package again (so i think we need another environment that keeps track of that) because this is faster and then we should end up with all dates being close to the date of the given commit, although this might also require manual intervention. @b-rodrigues agree? |
|
I haven't had the chance to look at this if depth but I generally don't like when functions mess with the global environment, isn't it possible to keep the track in a variable within the scope of the function ? |
|
Yeah I know that this is not super clean, but we can remove that variables again from the global environment. I don't think it's that easy. That's because of this recursive call offetchgits within fetchgit. So then we are in fetchgit with e.g. |
|
wouldn't adding a variable |
|
I don't think so because when fetchgits is called it's a new environment then and |
|
why not cache the list of seen packages in a /tmp/? |
this is now better handled by post processing and takes time
080867a to
fb1f1aa
Compare
|
git commits were quite a mess, not it's all rebased okay i try to find a solution to avoid global environments. I tried several solutions, but none of them worked so far. I used s3 objects, however also tricky to really keep values across functions, that would at last mean that i have to carry this object through all functions (from fetchgits to resolve_package_commit are quite some). I also tried your solution, which seemed to work fine. But i then realized that for each loop it creates a new temporary file .. right now experimenting with environments , but not global environments, because so far i have found this easiest to handle. |
|
Hehe, okay in the end I used your solution and saved it to a temporary file (avoids creating a new file by using fixed file name, not that complicated). THe dedup tests fails because of some format stuff and because deup actually has the "wrong" commit (meaning of a date 2022-08-27 as explinaed above) |
|
I've run
|
I'm not sure yet if it's not linked but in any case I'll have to fix it first then |
ok so I’ve added an action to run R CMD Check on a runner without Nix, and indeed, there is something wrong the response from the server. I thiiiink I know what’s going on: https://github.com/ropensci/rix/actions/runs/13562423321/job/37908068713 |
|
Ah okay, and what's going on? |
|
I realized |
|
I want to deal with this issue first: #430 I think that the bug was introduced with the changes from the last PR |
|
Could you resolve conflicts? |
|
Will try to do this during work on GitHub directly, if it does not work I will do it this evening. |
|
oh man this really sucks online. I resolved all conflcit on github, but then "commiting" does not work, I then tried to copy manually each file and commiting manually, which is also not great. have to do this this evening in VSCode. |
580f422 to
870c699
Compare
|
ready to merge ? |
|
give me 5 more mintues to manually check. It was again quite confusing because i didn't realize that the resolving conflicts removed the ..., so ignore_remotes_cache was not passed, and this produced some weird error. |
|
Yeah I think everything is good! Long journey! 👍 |
|
or let me write this caching a little in the docs |
|
yes that'd be great I'll wait then |
|
Have a look if you are okay with the explaination. Otherwise ready to merge finally. |
| However, if you disable this feature, then `{rix}` will fetch package `C` twice, once for the commit of the 1st December 2024 as before | ||
| (closest date that is earlier than the commit date of package `A`) and once for a date | ||
| before the 1st November 2024 (closest date that is earlier than the commit date of package `B`). | ||
| You should be aware of this because if you are unlucky the default option `ignore_remotes_cache = FALSE` might result in a solution that does not work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean it might not work because the commit that will be picked might be for a version of C incompatible with either A or B?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We observed this with he mlr3 example.
I mean this can always happen with GitHub packages. I just wanted the user to be aware of the fact that there a multiple solution of which commit rix should pick for package c in sich a situation and that some of them might work and others not. And that the one rix chooses is the one that is closest to package a, but not necessarily the one that works.
Maybe you find better words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote it a little, but also happy to take suggestions
|
Many thanks! this is a great addition. I was a bit worried at first since it introduced quite a lot of code, but it's for the better. |
|
Welcome. I saw that you edited the links for the tests workflows. However, you also need to change the branch, not only the username. |
@b-rodrigues My inital plan was to improve my existing code to remove duplicates so that your post-processing function does even has to do anything
However, I then realized that this was 1) pretty complex 2) quite inefficient because I call
get_remoteto get the remote dependencies for each remote dependencies. This mean for our example withmlr3extralearnersGithubAPIs call for each of the 6 remote dependencies.That's why i decided to remove my previous added code.
So with the main branch this takes ~30 on my machine
With the new branch the same code takes 23s
There are also no duplicates.
This also works fine with this code (my inital try to remove duplicates): #385
However, this leads to a double entry of seurat-data: #388
So to be explicit:
has
seurat-datatwice.Do you know why? Only one thing I realized is that the
refof the seurat-data is different. So Maybe this should only check for theAnd second thing is that if packages are removed there are lots of blank lines (now the mlr3extralearners example again).
default.nix.log