PRISM2015

From HELIX

(Difference between revisions)
Jump to: navigation, search
(Created page with "__NOTITLE__ = Unified Cache: A Case for Low-Latency Communication = Khalid Al-Hawaj, Simone Campanoni, Gu-Yeon Wei, David Brooks <br> ''International Workshop on Parallelism in...")
 
Line 11: Line 11:
We have developed a simplified version of the HELIX-RC architecture that we call unified cache, and we have implemented it on an FPGA board. Our design augments a multicore platform with a simplified ring cache—the architectural component of the HELIX-RC co-design. With the aid of microbenchmarks, our FPGA prototype confirms the HELIX-RC findings.
We have developed a simplified version of the HELIX-RC architecture that we call unified cache, and we have implemented it on an FPGA board. Our design augments a multicore platform with a simplified ring cache—the architectural component of the HELIX-RC co-design. With the aid of microbenchmarks, our FPGA prototype confirms the HELIX-RC findings.
After describing both the ring cache and the parallel code generated by the HELIX compiler, we sketch the design of the unified cache and we evaluate its implementation on a Xilinx VC707 FPGA board.
After describing both the ring cache and the parallel code generated by the HELIX compiler, we sketch the design of the unified cache and we evaluate its implementation on a Xilinx VC707 FPGA board.
-
 
[ [[media:PRISM2015_Paper.pdf|Paper]] ]
[ [[media:PRISM2015_Paper.pdf|Paper]] ]

Latest revision as of 18:50, 8 June 2015

Unified Cache: A Case for Low-Latency Communication

Khalid Al-Hawaj, Simone Campanoni, Gu-Yeon Wei, David Brooks


International Workshop on Parallelism in Mobile Platforms (PRISM), June, 2015


Increasing computational demand on mobile devices calls for energy-friendly solutions for accelerating single programs. In the multicore era, thread level parallelism (TLP) can accelerate single-threaded programs without requiring power-hungry cores. HELIX-RC, a recently proposed co-design between the HELIX parallelizing compiler and its target architecture, shows that substantial TLP can be extracted from loops with small bodies by optimizing core-to-core communication. Previously, the effectiveness of the HELIX-RC approach has been demonstrated through simulation. In this paper, we evaluate a HELIX-RC-like solution on a real platform. We have developed a simplified version of the HELIX-RC architecture that we call unified cache, and we have implemented it on an FPGA board. Our design augments a multicore platform with a simplified ring cache—the architectural component of the HELIX-RC co-design. With the aid of microbenchmarks, our FPGA prototype confirms the HELIX-RC findings. After describing both the ring cache and the parallel code generated by the HELIX compiler, we sketch the design of the unified cache and we evaluate its implementation on a Xilinx VC707 FPGA board.

[ Paper ]

Personal tools