Skip to content

Conversation

@bradcray
Copy link
Member

The way I learned the SUMMA algorithm was essentially "Locales broadcast their blocks of data to all the other locales in their row/col." But in Chapel, there isn't really a broadcast option (nor would we necessarily want to use one here if there was), so I'd implemented this by having all locales do a remote read of the block in question.

Engin pointed out (some time ago) that this could be rewritten to avoid bottlenecks by skewing each locale by its row/col ID such that each block copy is a 1:1 communication rather than a sqrt(numLocales):1 communication, which could bottleneck. This implements that transformation.

The way I learned the SUMMA algorithm was essentially "Locales
broadcast their blocks of data to all the other locales in their
row/col."  But in Chapel, there isn't really a broadcast option (nor
would we necessarily want to use one here if there was), so I'd
implemented this by having all locales do a remote read of the block
in question.

Engin pointed out (some time ago) that this could be rewritten to
avoid bottlenecks by skewing each locale by its row/col ID such that
each block copy is a 1:1 communication rather than a
sqrt(numLocales):1 communication, which could bottleneck.  This
implements that transformation.

---
Signed-off-by: Brad Chamberlain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant