ISCA2014
From HELIX
(Created page with "__NOTITLE__ = Decoupling Communication From Computation Makes Non-Numerical Programs Parallelizable = Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei,...") |
|||
Line 1: | Line 1: | ||
__NOTITLE__ | __NOTITLE__ | ||
- | = | + | = HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs = |
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks | Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks | ||
+ | |||
<br> | <br> | ||
''Proc. International Symposium on Computer Architecture (ISCA), June, 2014'' | ''Proc. International Symposium on Computer Architecture (ISCA), June, 2014'' | ||
- | + | Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarks. | |
+ | |||
+ | [ [[media:ISCA2014_Paper.pdf|Paper]] ] |
Revision as of 13:21, 21 April 2014
HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks
Proc. International Symposium on Computer Architecture (ISCA), June, 2014
Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarks.
[ Paper ]