-
Notifications
You must be signed in to change notification settings - Fork 1k
Interruption problems in frollapply
#7428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Instead of calling invokeRestart("abort") in the interrupt handler,
return from it. This continues the dispatch of the interrupt and lets an
outer handler catch it:
tryCatch(
frollapply(1:1e6, 1, \(.) { Sys.sleep(ret <- sum(.)); ret}),
interrupt = \(e) 'interrupted'
)
^C[1] "interrupted"
With invokeRestart("abort"), the interrupt cannot be handled further.
While handling an interrupt, ask mccollect() to wait for the child process to exit (with a warning) in order to avoid producing zombies. Otherwise a process that is too slow to react to SIGTERM will remain a zombie until the parent process exits.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7428 +/- ##
==========================================
- Coverage 99.13% 99.13% -0.01%
==========================================
Files 85 85
Lines 16618 16617 -1
==========================================
- Hits 16474 16473 -1
Misses 144 144 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Could the zombie processes also be the reason that the codecov github job gets stuck in #7288? |
Generated via commit d1f99b8 Download link for the artifact containing the test results: ↓ atime-results.zip
|
|
Apparently there are some other packages having the same problem |
|
Can't say no with 100% certainty, but I don't see how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The previous code with wait=FALSE looks kind of wrong since timeout=0.
Still would like a NEWS item, maybe as addendum to the whole froll* NEWS
|
News should not be needed as this is change to code added in dev. It could eventually explain current interrupt behavior but I think better to have it in manual than news. |
|
Need to double check whether interrupting the first |

There were two problems with the way
frollapplyhandled interruptions:invokeRestart("abort")in the interrupt handler prevented any further error or calling handlers from running (but not exit handlers liketryCatch(finally=...)oron.exit(); those kept working). I think that it's better to return from the calling handler, letting R continue the dispatch to other possible handlers:frollapplysometimes left zombie processes. The only way to reap them was to unload theparallelparallel (or quit R). I think that the parent process was too quick to callmccollect()after sending theSIGINT, so the child process did not have enough time to process theSIGINT, quit, and be ready to be collected bywaitpid(). I think that the zombie processes can be avoided by forcingmccollect()to wait, but that opens opportunities for further problems.mccollect()ing a terminated process results in a warning: do we needsuppressWarnings()? If the child processes are really hung (e.g. due to the rolling function causing a deadlock), a second interrupt seems to interruptmccollectdespite thesuspendInterrupts()(???) and unblock the parent process: