https://helix.eecs.harvard.edu/index.php?title=Special:Contributions/Simone&feed=atom&limit=50&target=Simone&year=&month=HELIX - User contributions [en]2024-03-29T05:26:16ZFrom HELIXMediaWiki 1.16.5https://helix.eecs.harvard.edu/index.php/File:CC2016_Paper.pdfFile:CC2016 Paper.pdf2017-01-10T22:36:07Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/CC2016CC20162017-01-10T22:35:26Z<p>Simone: Created page with "__NOTITLE__ = Performance Implications of Transient Loop-Carried Data Dependences in Automatically Parallelized Loops = Niall Murphy, Timothy Jones, Robert Mullins, Simone Campa..."</p>
<hr />
<div>__NOTITLE__<br />
= Performance Implications of Transient Loop-Carried Data Dependences in Automatically Parallelized Loops =<br />
<br />
Niall Murphy, Timothy Jones, Robert Mullins, Simone Campanoni<br />
<br />
<br><br />
''Proc. International Conference on Compiler Construction (CC), March, 2016''<br />
<br />
<br><br />
Recent approaches to automatic parallelization have taken advantage of the low-latency on-chip interconnect provided in modern multicore processors, demonstrating significant speedups, even for complex workloads.<br />
Although these techniques can already extract significant thread-level parallelism from application loops, we are interested in quantifying and exploiting any additional performance that remains on the table.<br />
<br />
This paper confirms the existence of significant extra thread-level parallelism within loops parallelized by the HELIX compiler.<br />
However, improving static data dependence analysis is unable to reach the additional performance offered because the existing loop-carried dependences are true only on a small subset of loop iterations.<br />
We therefore develop three approaches to take advantage of the transient nature of these data dependences through speculation, via transactional memory support.<br />
Results show that coupling the state-of-the-art data dependence analysis with fine-grained speculation achieves most of the speedups and may help close the gap towards the limit of HELIX-style thread-level parallelism.<br />
<br />
[ [[media:CC2016_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2017-01-10T22:32:47Z<p>Simone: </p>
<hr />
<div>* [[CC2016|Performance Implications of Transient Loop-Carried Data Dependences in Automatically Parallelized Loops]], CC 2016<br />
<br />
* [[CGO2015|HELIX-UP: Relaxing Program Semantics to Unleash Parallelization]], CGO 2015<br />
<br />
* [[PRISM2015|Unified Cache: A Case for Low-Latency Communication]], PRISM 2015<br />
<br />
* [[CPC2015|Limits of Static Dependence Analysis for Automatic Parallelization]], CPC 2015<br />
<br />
* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[PRISM2014|Breaking Cyclic-Multithreading Parallelization with XML Parsing]], PRISM 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/PRISM2015PRISM20152015-06-08T18:50:44Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= Unified Cache: A Case for Low-Latency Communication =<br />
<br />
Khalid Al-Hawaj, Simone Campanoni, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''International Workshop on Parallelism in Mobile Platforms (PRISM), June, 2015''<br />
<br />
<br><br />
Increasing computational demand on mobile devices calls for energy-friendly solutions for accelerating single programs. In the multicore era, thread level parallelism (TLP) can accelerate single-threaded programs without requiring power-hungry cores. HELIX-RC, a recently proposed co-design between the HELIX parallelizing compiler and its target architecture, shows that substantial TLP can be extracted from loops with small bodies by optimizing core-to-core communication. Previously, the effectiveness of the HELIX-RC approach has been demonstrated through simulation. In this paper, we evaluate a HELIX-RC-like solution on a real platform.<br />
We have developed a simplified version of the HELIX-RC architecture that we call unified cache, and we have implemented it on an FPGA board. Our design augments a multicore platform with a simplified ring cache—the architectural component of the HELIX-RC co-design. With the aid of microbenchmarks, our FPGA prototype confirms the HELIX-RC findings.<br />
After describing both the ring cache and the parallel code generated by the HELIX compiler, we sketch the design of the unified cache and we evaluate its implementation on a Xilinx VC707 FPGA board.<br />
<br />
[ [[media:PRISM2015_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2015_Paper.pdfFile:PRISM2015 Paper.pdf2015-06-08T18:49:45Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/PRISM2015PRISM20152015-06-08T18:49:36Z<p>Simone: Created page with "__NOTITLE__ = Unified Cache: A Case for Low-Latency Communication = Khalid Al-Hawaj, Simone Campanoni, Gu-Yeon Wei, David Brooks <br> ''International Workshop on Parallelism in..."</p>
<hr />
<div>__NOTITLE__<br />
= Unified Cache: A Case for Low-Latency Communication =<br />
<br />
Khalid Al-Hawaj, Simone Campanoni, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''International Workshop on Parallelism in Mobile Platforms (PRISM), June, 2015''<br />
<br />
<br><br />
Increasing computational demand on mobile devices calls for energy-friendly solutions for accelerating single programs. In the multicore era, thread level parallelism (TLP) can accelerate single-threaded programs without requiring power-hungry cores. HELIX-RC, a recently proposed co-design between the HELIX parallelizing compiler and its target architecture, shows that substantial TLP can be extracted from loops with small bodies by optimizing core-to-core communication. Previously, the effectiveness of the HELIX-RC approach has been demonstrated through simulation. In this paper, we evaluate a HELIX-RC-like solution on a real platform.<br />
We have developed a simplified version of the HELIX-RC architecture that we call unified cache, and we have implemented it on an FPGA board. Our design augments a multicore platform with a simplified ring cache—the architectural component of the HELIX-RC co-design. With the aid of microbenchmarks, our FPGA prototype confirms the HELIX-RC findings.<br />
After describing both the ring cache and the parallel code generated by the HELIX compiler, we sketch the design of the unified cache and we evaluate its implementation on a Xilinx VC707 FPGA board.<br />
<br />
<br />
[ [[media:PRISM2015_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2015-06-08T18:46:17Z<p>Simone: </p>
<hr />
<div>* [[CGO2015|HELIX-UP: Relaxing Program Semantics to Unleash Parallelization]], CGO 2015<br />
<br />
* [[PRISM2015|Unified Cache: A Case for Low-Latency Communication]], PRISM 2015<br />
<br />
* [[CPC2015|Limits of Static Dependence Analysis for Automatic Parallelization]], CPC 2015<br />
<br />
* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[PRISM2014|Breaking Cyclic-Multithreading Parallelization with XML Parsing]], PRISM 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:CGO2015_Slides.pptxFile:CGO2015 Slides.pptx2015-06-08T18:44:54Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/CGO2015CGO20152015-06-08T18:44:42Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-UP: Relaxing Program Semantics to Unleash Parallelization =<br />
<br />
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Code Generation and Optimization (CGO), February, 2015''<br />
<br />
<br><br />
Automatic generation of parallel code for general-purpose commodity processors is a challenging computational problem.<br />
Nevertheless, there is a lot of latent thread-level parallelism in the way sequential programs are actually used.<br />
To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results.<br />
We have developed a parallelizing compiler and runtime that substantially improve scalability by allowing parallelized code to briefly sidestep strict adherence to language semantics at run time.<br />
In addition to boosting performance, our approach limits the sensitivity of parallelized code to the parameters of target CPUs (such as core-to-core communication latency) and the accuracy of data dependence analysis.<br />
<br />
[ [[media:CGO2015_Paper.pdf|Paper]] ] [ [[media:CGO2015_Slides.pptx|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/CGO2015CGO20152015-06-08T18:44:07Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-UP: Relaxing Program Semantics to Unleash Parallelization =<br />
<br />
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Code Generation and Optimization (CGO), February, 2015''<br />
<br />
<br><br />
Automatic generation of parallel code for general-purpose commodity processors is a challenging computational problem.<br />
Nevertheless, there is a lot of latent thread-level parallelism in the way sequential programs are actually used.<br />
To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results.<br />
We have developed a parallelizing compiler and runtime that substantially improve scalability by allowing parallelized code to briefly sidestep strict adherence to language semantics at run time.<br />
In addition to boosting performance, our approach limits the sensitivity of parallelized code to the parameters of target CPUs (such as core-to-core communication latency) and the accuracy of data dependence analysis.<br />
<br />
[ [[media:CGO2015_Paper.pdf|Paper]] ] [ [[media:CGO2015_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:CGO2015_Paper.pdfFile:CGO2015 Paper.pdf2015-01-13T18:08:49Z<p>Simone: uploaded a new version of &quot;File:CGO2015 Paper.pdf&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/File:CPC2015_Paper.pdfFile:CPC2015 Paper.pdf2014-12-27T21:26:59Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/CPC2015CPC20152014-12-27T21:24:36Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= Limits of Static Dependence Analysis for Automatic Parallelization =<br />
<br />
Niall Murphy, Timothy Jones, Simone Campanoni, Robert Mullins<br />
<br />
<br><br />
''International Workshop on Compilers for Parallel Computing (CPC), January, 2015''<br />
<br />
<br><br />
Automatic parallelization is an increasingly important technique for accelerating sequential applications on multicore processors.<br />
This approach relies on having a very accurate static dependence analysis to identify independent sections of code. <br />
Previously it has been assumed that improving this analysis would also improve the performance of parallelized code. <br />
In this paper we use novel profiling techniques to see how much room there is for improvement of the static analysis. <br />
By feeding this knowledge back into the compiler we simulate a perfectly accurate dependence analysis. <br />
Although we find that the compiler does indeed overestimate the number of data dependences, this extra knowledge does not help the compiler to achieve better performance. <br />
We conclude that other avenues, such as speculation, must be explored to surpass current automatic parallelization efforts.<br />
<br />
[ [[media:CPC2015_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:CGO2015_Paper.pdfFile:CGO2015 Paper.pdf2014-12-18T20:31:50Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/CGO2015CGO20152014-12-18T20:31:27Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-UP: Relaxing Program Semantics to Unleash Parallelization =<br />
<br />
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Code Generation and Optimization (CGO), February, 2015''<br />
<br />
<br><br />
Automatic generation of parallel code for general-purpose commodity processors is a challenging computational problem.<br />
Nevertheless, there is a lot of latent thread-level parallelism in the way sequential programs are actually used.<br />
To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results.<br />
We have developed a parallelizing compiler and runtime that substantially improve scalability by allowing parallelized code to briefly sidestep strict adherence to language semantics at run time.<br />
In addition to boosting performance, our approach limits the sensitivity of parallelized code to the parameters of target CPUs (such as core-to-core communication latency) and the accuracy of data dependence analysis.<br />
<br />
[ [[media:CGO2015_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/CPC2015CPC20152014-12-09T16:15:46Z<p>Simone: Created page with "__NOTITLE__ = Limits of Static Dependence Analysis for Automatic Parallelization = Niall Murphy, Timothy Jones, Simone Campanoni, Robert Mullins <br> ''International Workshop o..."</p>
<hr />
<div>__NOTITLE__<br />
= Limits of Static Dependence Analysis for Automatic Parallelization =<br />
<br />
Niall Murphy, Timothy Jones, Simone Campanoni, Robert Mullins<br />
<br />
<br><br />
''International Workshop on Compilers for Parallel Computing (CPC), January, 2015''<br />
<br />
<br><br />
Automatic parallelization is an increasingly important technique for accelerating sequential applications on multicore processors.<br />
This approach relies on having a very accurate static dependence analysis to identify independent sections of code. <br />
Previously it has been assumed that improving this analysis would also improve the performance of parallelized code. <br />
In this paper we use novel profiling techniques to see how much room there is for improvement of the static analysis. <br />
By feeding this knowledge back into the compiler we simulate a perfectly accurate dependence analysis. <br />
Although we find that the compiler does indeed overestimate the number of data dependences, this extra knowledge does not help the compiler to achieve better performance. <br />
We conclude that other avenues, such as speculation, must be explored to surpass current automatic parallelization efforts.</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2014-12-09T16:12:29Z<p>Simone: </p>
<hr />
<div>* [[CGO2015|HELIX-UP: Relaxing Program Semantics to Unleash Parallelization]], CGO 2015<br />
<br />
* [[CPC2015|Limits of Static Dependence Analysis for Automatic Parallelization]], CPC 2015<br />
<br />
* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[PRISM2014|Breaking Cyclic-Multithreading Parallelization with XML Parsing]], PRISM 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/CGO2015CGO20152014-11-16T21:20:11Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-UP: Relaxing Program Semantics to Unleash Parallelization =<br />
<br />
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Code Generation and Optimization (CGO), February, 2015''<br />
<br />
<br><br />
Automatic generation of parallel code for general-purpose commodity processors is a challenging computational problem.<br />
Nevertheless, there is a lot of latent thread-level parallelism in the way sequential programs are actually used.<br />
To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results.<br />
We have developed a parallelizing compiler and runtime that substantially improve scalability by allowing parallelized code to briefly sidestep strict adherence to language semantics at run time.<br />
In addition to boosting performance, our approach limits the sensitivity of parallelized code to the parameters of target CPUs (such as core-to-core communication latency) and the accuracy of data dependence analysis.</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2014-11-16T21:19:55Z<p>Simone: </p>
<hr />
<div>* [[CGO2015|HELIX-UP: Relaxing Program Semantics to Unleash Parallelization]], CGO 2015<br />
<br />
* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[PRISM2014|Breaking Cyclic-Multithreading Parallelization with XML Parsing]], PRISM 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/CGO2015CGO20152014-11-03T15:23:15Z<p>Simone: Created page with "__NOTITLE__ = Relaxing Program Semantics to Unleash Parallelization = Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks <br> ''Proc. International Symposium on Code G..."</p>
<hr />
<div>__NOTITLE__<br />
= Relaxing Program Semantics to Unleash Parallelization =<br />
<br />
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Code Generation and Optimization (CGO), February, 2015''<br />
<br />
<br><br />
Automatic generation of parallel code for general-purpose commodity processors is a challenging computational problem.<br />
Nevertheless, there is a lot of latent thread-level parallelism in the way sequential programs are actually used.<br />
To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results.<br />
We have developed a parallelizing compiler and runtime that substantially improve scalability by allowing parallelized code to briefly sidestep strict adherence to language semantics at run time.<br />
In addition to boosting performance, our approach limits the sensitivity of parallelized code to the parameters of target CPUs (such as core-to-core communication latency) and the accuracy of data dependence analysis.</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2014-11-03T15:20:13Z<p>Simone: </p>
<hr />
<div>* [[CGO2015|Relaxing Program Semantics to Unleash Parallelization]], CGO 2015<br />
<br />
* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[PRISM2014|Breaking Cyclic-Multithreading Parallelization with XML Parsing]], PRISM 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2014_Paper.pdfFile:PRISM2014 Paper.pdf2014-06-23T17:31:00Z<p>Simone: uploaded a new version of &quot;File:PRISM2014 Paper.pdf&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2014_Paper.pdfFile:PRISM2014 Paper.pdf2014-06-23T17:27:21Z<p>Simone: uploaded a new version of &quot;File:PRISM2014 Paper.pdf&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2014_Paper.pdfFile:PRISM2014 Paper.pdf2014-06-23T17:24:19Z<p>Simone: uploaded a new version of &quot;File:PRISM2014 Paper.pdf&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/File:ISCA2014_SlidesFF.pdfFile:ISCA2014 SlidesFF.pdf2014-06-20T20:09:20Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/ISCA2014ISCA20142014-06-20T20:09:08Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs =<br />
<br />
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Computer Architecture (ISCA), June, 2014''<br />
<br />
<br><br />
Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarks.<br />
<br />
[ [[media:ISCA2014_Paper.pdf|Paper]] ] [ [[media:ISCA2014_Slides.pdf|Slides]] ] [ [[media:ISCA2014_SlidesFF.pdf|Fast Forward]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:ISCA2014_Slides.pdfFile:ISCA2014 Slides.pdf2014-06-20T20:07:12Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/ISCA2014ISCA20142014-06-20T20:06:21Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs =<br />
<br />
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Computer Architecture (ISCA), June, 2014''<br />
<br />
<br><br />
Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarks.<br />
<br />
[ [[media:ISCA2014_Paper.pdf|Paper]] ] [ [[media:ISCA2014_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2014_Paper.pdfFile:PRISM2014 Paper.pdf2014-06-15T13:54:12Z<p>Simone: uploaded a new version of &quot;File:PRISM2014 Paper.pdf&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2014_Slides.pdfFile:PRISM2014 Slides.pdf2014-06-15T05:01:37Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/PRISM2014PRISM20142014-06-15T05:00:53Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= Breaking Cyclic-Multithreading Parallelization with XML Parsing =<br />
<br />
Simone Campanoni, Svilen Kanev, Kevin Brownell, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Workshop on Parallelism in Mobile Platforms (PRISM), June, 2014''<br />
<br />
<br><br />
HELIX-RC, a modern re-evaluation of the cyclic-multithreading (CMT) compiler technique, extracts threads from sequential code automatically. As a CMT approach, HELIX-RC gains performance by running iterations of the same loop on different cores in a multicore. It successfully boosts performance for several SPEC CINT benchmarks previously considered unparallelizable. However, this paper shows there are workloads with different characteristics, which even idealized CMT cannot parallelize.<br />
We identify how to overcome an inherent limitation of CMT for these workloads. CMT techniques only run iterations of a single loop in parallel at any given time. We propose exploiting parallelism not only within a single loop, but also among multiple loops. We call this execution model Multiple CMT (MCMT), and show that it is crucial for auto-parallelizing a broader class of workloads.<br />
To highlight the need for MCMT, we target a workload that is naturally hard for CMT -- parsing XML-structured data. We show that even idealized CMT fails on XML parsing. Instead, MCMT extracts speedups up to 3.9x on 4 cores.<br />
<br />
[ [[media:PRISM2014_Paper.pdf|Paper]] ] [ [[media:PRISM2014_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:PRISM2014_Paper.pdfFile:PRISM2014 Paper.pdf2014-06-09T13:51:05Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/PRISM2014PRISM20142014-06-09T13:50:46Z<p>Simone: Created page with "__NOTITLE__ = Breaking Cyclic-Multithreading Parallelization with XML Parsing = Simone Campanoni, Svilen Kanev, Kevin Brownell, Gu-Yeon Wei, David Brooks <br> ''Proc. Internati..."</p>
<hr />
<div>__NOTITLE__<br />
= Breaking Cyclic-Multithreading Parallelization with XML Parsing =<br />
<br />
Simone Campanoni, Svilen Kanev, Kevin Brownell, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Workshop on Parallelism in Mobile Platforms (PRISM), June, 2014''<br />
<br />
<br><br />
HELIX-RC, a modern re-evaluation of the cyclic-multithreading (CMT) compiler technique, extracts threads from sequential code automatically. As a CMT approach, HELIX-RC gains performance by running iterations of the same loop on different cores in a multicore. It successfully boosts performance for several SPEC CINT benchmarks previously considered unparallelizable. However, this paper shows there are workloads with different characteristics, which even idealized CMT cannot parallelize.<br />
We identify how to overcome an inherent limitation of CMT for these workloads. CMT techniques only run iterations of a single loop in parallel at any given time. We propose exploiting parallelism not only within a single loop, but also among multiple loops. We call this execution model Multiple CMT (MCMT), and show that it is crucial for auto-parallelizing a broader class of workloads.<br />
To highlight the need for MCMT, we target a workload that is naturally hard for CMT -- parsing XML-structured data. We show that even idealized CMT fails on XML parsing. Instead, MCMT extracts speedups up to 3.9x on 4 cores.<br />
<br />
[ [[media:PRISM2014_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2014-06-09T13:47:56Z<p>Simone: </p>
<hr />
<div>* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[PRISM2014|Breaking Cyclic-Multithreading Parallelization with XML Parsing]], PRISM 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:Simone_Campanoni.jpegFile:Simone Campanoni.jpeg2014-05-01T16:59:57Z<p>Simone: uploaded a new version of &quot;File:Simone Campanoni.jpeg&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/File:Robert_Mullins.jpegFile:Robert Mullins.jpeg2014-05-01T16:58:58Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/TeamTeam2014-05-01T16:58:43Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= '''Team'''=<br />
<br />
----<br />
'''Simone Campanoni'''<br />
<br />
[[File:Simone_Campanoni.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~xan Simone Campanoni] is currently working under [http://www.eecs.harvard.edu/~dbrooks Prof. David Brooks] in conjunction with [http://www.eecs.harvard.edu/~guyeon Prof. Gu-yeon Wei] at [http://www.harvard.edu Harvard University].<br />
His work focuses on the boundary between hardware and software, relying on dynamic compilation, run-time optimizations and virtual execution environments for investigating opportunities on auto-parallelization.<br />
He received his Ph.D. degree with honours from [http://www.polimi.it/en/english-version/ Politecnico di Milano] in 2009 with [http://www.dei.polimi.it/personale/docentidettaglio.php?id_docente=67&idlang=eng Prof. Stefano Crespi Reghizzi] as advisor.<br />
Simone is the author of [http://ildjit.sourceforge.net ILDJIT], a parallel dynamic compiler demonstrating principles from his thesis work.<br />
Simone started the [http://helix.eecs.harvard.edu HELIX] research project in January 2010 at the beginning of his post-doc.<br />
<br />
----<br />
'''Timothy M. Jones'''<br />
<br />
[[File:Timothy_Jones.jpeg|200px]]<br />
<br />
[http://www.cl.cam.ac.uk/~tmj32 Timothy Jones] is a post-doctoral researcher at the [http://www.cam.ac.uk University of Cambridge] [http://www.cl.cam.ac.uk Computer Laboratory], where he works within the [http://www.cl.cam.ac.uk/research/comparch/ Computer Architecture Group]. Between September 2008 and August 2013 he holds a Research Fellowship from the UK's [http://www.raeng.org.uk Royal Academy of Engineering] and [http://www.epsrc.ac.uk EPSRC] to investigate compiler-directed power saving in multicore processors. As part of his Fellowship he worked with [http://www.eecs.harvard.edu/~dbrooks David Brooks] and his research team at [http://www.harvard.edu Harvard] for the whole of 2010 and now works part-time with [http://www.arm.com ARM].<br />
<br />
----<br />
'''Kevin Brownell'''<br />
<br />
[[File:Kevin_Brownell.jpeg|200px]]<br />
<br />
Kevin is a PhD student in Engineering Sciences at Harvard University. His research has focused on a variability aware post fabrication technique called 'voltage interpolation'. Among his research interests are on-chip networks, GPGPU style architectures, and 3D graphics. He is advised by Professors David Brooks and Gu-Yeon Wei.<br />
<br />
----<br />
'''Svilen Kanev'''<br />
<br />
[[File:Svilen_Kanev.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~skanev Svilen Kanev] is a fresh PhD student in Computer Science at Harvard, advised by [http://www.eecs.harvard.edu/~dbrooks Prof. David Brooks] and [http://www.eecs.harvard.edu/~guyeon Prof. Gu-Yeon Wei]. He is mainly interested in the intersection of computer systems and architectures. His main research intersts lie in hardware-software approaches to reliability and modeling and design of small cores. He is the somewhat proud maintainer of the [http://xiosim.org XIOSim] simulation suite. He did his BA degree in Computer Science at Harvard in the same research group.<br />
<br />
----<br />
'''Niall Murphy'''<br />
<br />
[[File:Niall_Murphy.jpeg|200px]]<br />
<br />
Niall Murphy is a PhD student at the University of Cambridge under the supervision of Dr Robert Mullins. His research is investigating automatic parallelization of irregular programs using a combination of compile-time analysis and speculation execution. This involves mixing HELIX static parallelization with speculative execution using software transactional memory. The aim is to increase the amount of parallelism that can be exploited by relaxing the need for the compiler to generate correct code while using a runtime system to preserve.<br />
<br />
----<br />
'''Glenn Holloway'''<br />
<br />
[[File:Glenn_Holloway.jpg|200px]]<br />
<br />
----<br />
'''Prof. Robert Mullins'''<br />
<br />
[[File:Robert_Mullins.jpeg|200px]]<br />
<br />
Robert Mullins is a Senior Lecturer in the Computer Laboratory at the University of Cambridge. His research and teaching focuses on computer architecture and VLSI design. He has a particular interest in on-chip interconnection networks, chip-multiprocessors and novel parallel processing fabrics.<br />
He is a Fellow of St. John's College, where he is Director of Studies for Computer Science. He was a co-founder of the Raspberry Pi Foundation, a UK charity that promotes the study of computer science and electronics at the school level.<br />
<br />
----<br />
'''Prof. Gu-Yeon Wei'''<br />
<br />
[[File:Wei_GuYeon.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~guyeon Prof. Gu-Yeon Wei]'s research group focuses on various aspects of high-speed, low-power digital and mixed-signal VLSI circuits. Significant advances in modern CMOS technology have enabled highly complex machines capable of executing extremely high levels of computation with performance doubling every few years. However, this performance comes at the cost of higher power dissipation. At the other end of the spectrum, portable electronic devices demand low-power, energy-efficient operation. Wei's research group investigates the interactions between VLSI circuits, computer architecture, and software layers to enhance energy efficiency in future computing systems. <br />
<br />
----<br />
'''Prof. David Brooks'''<br />
<br />
[[File:David_Brooks.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~dbrooks My] research focuses on the interaction between the architecture and software of computer systems and underlying hardware implementation challenges. These challenges include power, reliability, and variability issues across embedded and high-performance computing systems. A basic tenet of my research is that architecture design must be cognizant of these implementation issues, and that multi-layer solutions spanning circuits, architecture, and software can provide significant advantages. Addressing technology-scaling issues in a multi-layer fashion requires an understanding of the impact at the silicon level, and we have completed several prototype chip designs to meet these goals.<br />
<br />
== Sponsor ==<br />
<br />
[[File:Microsoft_logo.jpeg|400px]]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:Niall_Murphy.jpegFile:Niall Murphy.jpeg2014-05-01T16:56:46Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/TeamTeam2014-05-01T16:56:33Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= '''Team'''=<br />
<br />
----<br />
'''Simone Campanoni'''<br />
<br />
[[File:Simone_Campanoni.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~xan Simone Campanoni] is currently working under [http://www.eecs.harvard.edu/~dbrooks Prof. David Brooks] in conjunction with [http://www.eecs.harvard.edu/~guyeon Prof. Gu-yeon Wei] at [http://www.harvard.edu Harvard University].<br />
His work focuses on the boundary between hardware and software, relying on dynamic compilation, run-time optimizations and virtual execution environments for investigating opportunities on auto-parallelization.<br />
He received his Ph.D. degree with honours from [http://www.polimi.it/en/english-version/ Politecnico di Milano] in 2009 with [http://www.dei.polimi.it/personale/docentidettaglio.php?id_docente=67&idlang=eng Prof. Stefano Crespi Reghizzi] as advisor.<br />
Simone is the author of [http://ildjit.sourceforge.net ILDJIT], a parallel dynamic compiler demonstrating principles from his thesis work.<br />
Simone started the [http://helix.eecs.harvard.edu HELIX] research project in January 2010 at the beginning of his post-doc.<br />
<br />
----<br />
'''Timothy M. Jones'''<br />
<br />
[[File:Timothy_Jones.jpeg|200px]]<br />
<br />
[http://www.cl.cam.ac.uk/~tmj32 Timothy Jones] is a post-doctoral researcher at the [http://www.cam.ac.uk University of Cambridge] [http://www.cl.cam.ac.uk Computer Laboratory], where he works within the [http://www.cl.cam.ac.uk/research/comparch/ Computer Architecture Group]. Between September 2008 and August 2013 he holds a Research Fellowship from the UK's [http://www.raeng.org.uk Royal Academy of Engineering] and [http://www.epsrc.ac.uk EPSRC] to investigate compiler-directed power saving in multicore processors. As part of his Fellowship he worked with [http://www.eecs.harvard.edu/~dbrooks David Brooks] and his research team at [http://www.harvard.edu Harvard] for the whole of 2010 and now works part-time with [http://www.arm.com ARM].<br />
<br />
----<br />
'''Kevin Brownell'''<br />
<br />
[[File:Kevin_Brownell.jpeg|200px]]<br />
<br />
Kevin is a PhD student in Engineering Sciences at Harvard University. His research has focused on a variability aware post fabrication technique called 'voltage interpolation'. Among his research interests are on-chip networks, GPGPU style architectures, and 3D graphics. He is advised by Professors David Brooks and Gu-Yeon Wei.<br />
<br />
----<br />
'''Svilen Kanev'''<br />
<br />
[[File:Svilen_Kanev.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~skanev Svilen Kanev] is a fresh PhD student in Computer Science at Harvard, advised by [http://www.eecs.harvard.edu/~dbrooks Prof. David Brooks] and [http://www.eecs.harvard.edu/~guyeon Prof. Gu-Yeon Wei]. He is mainly interested in the intersection of computer systems and architectures. His main research intersts lie in hardware-software approaches to reliability and modeling and design of small cores. He is the somewhat proud maintainer of the [http://xiosim.org XIOSim] simulation suite. He did his BA degree in Computer Science at Harvard in the same research group.<br />
<br />
----<br />
'''Niall Murphy'''<br />
<br />
[[File:Niall_Murphy.jpeg|200px]]<br />
<br />
Niall Murphy is a PhD student at the University of Cambridge under the supervision of Dr Robert Mullins. His research is investigating automatic parallelization of irregular programs using a combination of compile-time analysis and speculation execution. This involves mixing HELIX static parallelization with speculative execution using software transactional memory. The aim is to increase the amount of parallelism that can be exploited by relaxing the need for the compiler to generate correct code while using a runtime system to preserve.<br />
<br />
----<br />
'''Glenn Holloway'''<br />
<br />
[[File:Glenn_Holloway.jpg|200px]]<br />
<br />
<br />
----<br />
'''Prof. Gu-Yeon Wei'''<br />
<br />
[[File:Wei_GuYeon.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~guyeon Prof. Gu-Yeon Wei]'s research group focuses on various aspects of high-speed, low-power digital and mixed-signal VLSI circuits. Significant advances in modern CMOS technology have enabled highly complex machines capable of executing extremely high levels of computation with performance doubling every few years. However, this performance comes at the cost of higher power dissipation. At the other end of the spectrum, portable electronic devices demand low-power, energy-efficient operation. Wei's research group investigates the interactions between VLSI circuits, computer architecture, and software layers to enhance energy efficiency in future computing systems. <br />
<br />
----<br />
'''Prof. David Brooks'''<br />
<br />
[[File:David_Brooks.jpeg|200px]]<br />
<br />
[http://www.eecs.harvard.edu/~dbrooks My] research focuses on the interaction between the architecture and software of computer systems and underlying hardware implementation challenges. These challenges include power, reliability, and variability issues across embedded and high-performance computing systems. A basic tenet of my research is that architecture design must be cognizant of these implementation issues, and that multi-layer solutions spanning circuits, architecture, and software can provide significant advantages. Addressing technology-scaling issues in a multi-layer fashion requires an understanding of the impact at the silicon level, and we have completed several prototype chip designs to meet these goals.<br />
<br />
== Sponsor ==<br />
<br />
[[File:Microsoft_logo.jpeg|400px]]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:ISCA2014_Paper.pdfFile:ISCA2014 Paper.pdf2014-04-22T13:52:16Z<p>Simone: uploaded a new version of &quot;File:ISCA2014 Paper.pdf&quot;</p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2014-04-22T13:50:39Z<p>Simone: </p>
<hr />
<div>* [[ISCA2014|HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs]], ISCA 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simonehttps://helix.eecs.harvard.edu/index.php/CGO2012CGO20122014-04-21T13:23:20Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing =<br />
<br />
Simone Campanoni, Timothy Jones, Glenn Holloway, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. Code Generation and Optimization (CGO), March, 2012''<br />
<br />
<br><br />
We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads. We show that the inter-thread communication costs forced by loop-carried data dependences can be mitigated by code optimization, by using an effective heuristic for selecting loops to parallelize, and by using helper threads to prefetch synchronization signals. We have implemented HELIX as part of an optimizing compiler framework that automatically selects and parallelizes loops from general sequential programs. The framework uses an analytical model of loop speedups, combined with profile data, to choose loops to parallelize. On a six-core Intel Core i7-980X, HELIX achieves speedups averaging 2.25, with a maximum of 4.12, for thirteen C benchmarks from SPEC CPU2000.<br />
<br />
[ [[media:CGO2012_HELIX.pdf|Paper]] ] [ [[media:CGO2012_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/IEEEMICRO2012IEEEMICRO20122014-04-21T13:22:56Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= Making the Extraction of Thread-Level Parallelism Mainstream =<br />
<br />
Simone Campanoni, Timothy Jones, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''IEEE Micro Special Issue on Parallelization of Sequential Code, 2012''<br />
<br />
<br><br />
Improving system performance increasingly depends on exploiting microprocessor parallelism, yet mainstream compilers still do not parallelize code automatically.<br />
Promising parallelization approaches have either required manual programmer assistance, depended on special hardware features, or risked slowing down programs they should have speeded up.<br />
HELIX is one such approach that automatically parallelizes general-purpose programs without requiring any special hardware.<br />
In this paper we show that in practice HELIX always avoids slowing down compiled programs, making it a suitable candidate for mainstream compilers.<br />
We also show experimentally that HELIX outperforms the most similar historical technique that has been implemented in production compilers.<br />
<br />
[ [[media:IEEEMICRO2012_Paper.pdf|Paper]] ] [ [http://www.computer.org/csdl/mags/mi/2012/04/mmi2012040008-abs.html On-line version] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/DAC2012DAC20122014-04-21T13:22:42Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= The HELIX Project: Overview and Directions =<br />
<br />
Simone Campanoni, Timothy Jones, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. Design Automation Conference (DAC), June, 2012''<br />
<br />
<br><br />
Parallelism has become the primary way to maximize processor performance and power efficiency.<br />
But because creating parallel programs by hand is difficult and prone to error, there is an urgent need for automatic ways of transforming conventional programs to exploit modern multicore systems.<br />
The HELIX compiler transformation is one such technique that has proven effective at parallelizing individual sequential programs automatically for a real six-core processor.<br />
We describe that transformation in the context of the broader HELIX research project, which aims to optimize the throughput of a multicore processor by coordinated changes in its architecture, its compiler, and its operating system.<br />
The goal is to make automatic parallelization mainstream in multiprogramming settings through ''adaptive'' algorithms for extracting and tuning thread-level parallelism.<br />
<br />
[ [[media:DAC2012_Paper.pdf|Paper]] ] [ [[media:DAC2012_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/DAC2012DAC20122014-04-21T13:22:15Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= The HELIX Project: Overview and Directions =<br />
<br />
Simone Campanoni, Timothy Jones, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. Design Automation Conference (DAC), June, 2012''<br />
<br />
<br />
Parallelism has become the primary way to maximize processor performance and power efficiency.<br />
But because creating parallel programs by hand is difficult and prone to error, there is an urgent need for automatic ways of transforming conventional programs to exploit modern multicore systems.<br />
The HELIX compiler transformation is one such technique that has proven effective at parallelizing individual sequential programs automatically for a real six-core processor.<br />
We describe that transformation in the context of the broader HELIX research project, which aims to optimize the throughput of a multicore processor by coordinated changes in its architecture, its compiler, and its operating system.<br />
The goal is to make automatic parallelization mainstream in multiprogramming settings through ''adaptive'' algorithms for extracting and tuning thread-level parallelism.<br />
<br />
<br><br />
[ [[media:DAC2012_Paper.pdf|Paper]] ] [ [[media:DAC2012_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/ISCA2014ISCA20142014-04-21T13:22:01Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs =<br />
<br />
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Computer Architecture (ISCA), June, 2014''<br />
<br />
<br><br />
Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarks.<br />
<br />
[ [[media:ISCA2014_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/File:ISCA2014_Paper.pdfFile:ISCA2014 Paper.pdf2014-04-21T13:21:41Z<p>Simone: </p>
<hr />
<div></div>Simonehttps://helix.eecs.harvard.edu/index.php/ISCA2014ISCA20142014-04-21T13:21:26Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs =<br />
<br />
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. International Symposium on Computer Architecture (ISCA), June, 2014''<br />
<br />
Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarks.<br />
<br />
[ [[media:ISCA2014_Paper.pdf|Paper]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/DAC2012DAC20122014-04-21T13:18:50Z<p>Simone: </p>
<hr />
<div>__NOTITLE__<br />
= The HELIX Project: Overview and Directions =<br />
<br />
Simone Campanoni, Timothy Jones, Glenn Holloway, Gu-Yeon Wei, David Brooks<br />
<br />
<br><br />
''Proc. Design Automation Conference (DAC), June, 2012''<br />
<br />
<br />
Parallelism has become the primary way to maximize processor performance and power efficiency.<br />
But because creating parallel programs by hand is difficult and prone to error, there is an urgent need for automatic ways of transforming conventional programs to exploit modern multicore systems.<br />
The HELIX compiler transformation is one such technique that has proven effective at parallelizing individual sequential programs automatically for a real six-core processor.<br />
We describe that transformation in the context of the broader HELIX research project, which aims to optimize the throughput of a multicore processor by coordinated changes in its architecture, its compiler, and its operating system.<br />
The goal is to make automatic parallelization mainstream in multiprogramming settings through ''adaptive'' algorithms for extracting and tuning thread-level parallelism.<br />
<br />
[ [[media:DAC2012_Paper.pdf|Paper]] ] [ [[media:DAC2012_Slides.pdf|Slides]] ]</div>Simonehttps://helix.eecs.harvard.edu/index.php/ISCA2014ISCA20142014-03-11T15:35:48Z<p>Simone: Created page with "__NOTITLE__ = Decoupling Communication From Computation Makes Non-Numerical Programs Parallelizable = Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei,..."</p>
<hr />
<div>__NOTITLE__<br />
= Decoupling Communication From Computation Makes Non-Numerical Programs Parallelizable =<br />
<br />
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu-Yeon Wei, David Brooks<br />
<br><br />
''Proc. International Symposium on Computer Architecture (ISCA), June, 2014''<br />
<br />
TBD</div>Simonehttps://helix.eecs.harvard.edu/index.php/PublicationsPublications2014-03-11T15:33:53Z<p>Simone: </p>
<hr />
<div>* [[ISCA2014|Decoupling Communication From Computation Makes Non-Numerical Programs Parallelizable]], ISCA 2014<br />
<br />
* [[DAC2012|The HELIX Project: Overview and Directions]], DAC 2012<br />
<br />
* [[IEEEMICRO2012|Making the Extraction of Thread-Level Parallelism Mainstream]], IEEE Micro 2012<br />
<br />
* [[CGO2012|HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing]], CGO 2012</div>Simone