Interruption problems in `frollapply` #7428

aitap · 2025-11-13T20:32:12Z

There were two problems with the way frollapply handled interruptions:

invokeRestart("abort") in the interrupt handler prevented any further error or calling handlers from running (but not exit handlers like tryCatch(finally=...) or on.exit(); those kept working). I think that it's better to return from the calling handler, letting R continue the dispatch to other possible handlers:

tryCatch(
  frollapply(1:1e6, 1, \(.) { Sys.sleep(ret <- sum(.)); ret}),
  interrupt = \(e) 'interrupted'
)
^C[1] "interrupted"

While testing on an older laptop with an i5-2520M CPU, I noticed that interrupting parallel frollapply sometimes left zombie processes. The only way to reap them was to unload the parallel parallel (or quit R). I think that the parent process was too quick to call mccollect() after sending the SIGINT, so the child process did not have enough time to process the SIGINT, quit, and be ready to be collected by waitpid(). I think that the zombie processes can be avoided by forcing mccollect() to wait, but that opens opportunities for further problems. mccollect()ing a terminated process results in a warning: do we need suppressWarnings()? If the child processes are really hung (e.g. due to the rolling function causing a deadlock), a second interrupt seems to interrupt mccollect despite the suspendInterrupts() (???) and unblock the parent process:

writeLines('void foo(void) { for(;;); }', 'foo.c')
tools::Rcmd('SHLIB foo.c')
foo <- mcparallel(.C('foo')) # child process is now hung
tryCatch(
 withCallingHandlers(
  Sys.sleep(10), # some interruptible process
  interrupt = \(e) suspendInterrupts( mccollect(foo) ) # handling the interrupt will hang
 ),
 interrupt = \(e) message('interrupted') # try to handle interrupting as well
)
^C^Cinterrupted # <-- unblocks after a second interrupt
tools::pskill(foo$pid) # stop the hung process
mccollect(foo) # clean up the zombie

Instead of calling invokeRestart("abort") in the interrupt handler, return from it. This continues the dispatch of the interrupt and lets an outer handler catch it: tryCatch( frollapply(1:1e6, 1, \(.) { Sys.sleep(ret <- sum(.)); ret}), interrupt = \(e) 'interrupted' ) ^C[1] "interrupted" With invokeRestart("abort"), the interrupt cannot be handled further.

While handling an interrupt, ask mccollect() to wait for the child process to exit (with a warning) in order to avoid producing zombies. Otherwise a process that is too slow to react to SIGTERM will remain a zombie until the parent process exits.

codecov · 2025-11-13T20:47:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.13%. Comparing base (df7fa80) to head (d1f99b8).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7428      +/-   ##
==========================================
- Coverage   99.13%   99.13%   -0.01%     
==========================================
  Files          85       85              
  Lines       16618    16617       -1     
==========================================
- Hits        16474    16473       -1     
  Misses        144      144

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ben-schwen · 2025-11-13T20:50:28Z

Could the zombie processes also be the reason that the codecov github job gets stuck in #7288?

github-actions · 2025-11-13T20:50:43Z

HEAD=frollapply-interrupt stopped early for DT[by,verbose=TRUE] improved in #6296
HEAD=frollapply-interrupt slower P<0.001 for memrecycle regression fixed in #5463

Generated via commit d1f99b8

Download link for the artifact containing the test results: ↓ atime-results.zip

Task	Duration
R setup and installing dependencies	4 minutes and 57 seconds
Installing different package versions	10 minutes and 12 seconds
Running and plotting the test cases	2 minutes and 41 seconds

ben-schwen · 2025-11-13T21:13:20Z

Apparently there are some other packages having the same problem

https://github.com/cran/bettermc/blob/34da1f6180067ae6970d42c3297442b5e0621784/R/mclapply.R#L723-L739

aitap · 2025-11-13T21:16:12Z

Can't say no with 100% certainty, but I don't see how R CMD check could be producing them: the current issue is only for when the user interrupts R. So far, the difference between hang and no hang is covr:::save_trace running as part of parallel::mcexit, which looks quite harmless.

ben-schwen

LGTM. The previous code with wait=FALSE looks kind of wrong since timeout=0.

Still would like a NEWS item, maybe as addendum to the whole froll* NEWS

jangorecki · 2025-11-18T12:24:05Z

News should not be needed as this is change to code added in dev. It could eventually explain current interrupt behavior but I think better to have it in manual than news.

aitap · 2025-11-20T20:48:49Z

Need to double check whether interrupting the first mccollect call may leave some PIDs already waited for and reused for a different process. We don't want to terminate an unrelated process because of that.

aitap added 2 commits November 13, 2025 22:55

aitap requested a review from MichaelChirico as a code owner November 13, 2025 20:32

jangorecki added this to the 1.18.0 milestone Nov 15, 2025

jangorecki added the froll label Nov 15, 2025

jangorecki approved these changes Nov 18, 2025

View reviewed changes

ben-schwen approved these changes Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interruption problems in `frollapply` #7428

Interruption problems in `frollapply` #7428

Uh oh!

aitap commented Nov 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 13, 2025

Uh oh!

ben-schwen commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

ben-schwen commented Nov 13, 2025

Uh oh!

aitap commented Nov 13, 2025

Uh oh!

ben-schwen left a comment •

edited

Loading

Uh oh!

jangorecki commented Nov 18, 2025

Uh oh!

aitap commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Interruption problems in frollapply #7428

Are you sure you want to change the base?

Interruption problems in frollapply #7428

Uh oh!

Conversation

aitap commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 13, 2025

Codecov Report

Uh oh!

ben-schwen commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

ben-schwen commented Nov 13, 2025

Uh oh!

aitap commented Nov 13, 2025

Uh oh!

ben-schwen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jangorecki commented Nov 18, 2025

Uh oh!

aitap commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Interruption problems in `frollapply` #7428

Interruption problems in `frollapply` #7428

aitap commented Nov 13, 2025 •

edited

Loading

ben-schwen left a comment •

edited

Loading