Breaking Cyclic-Multithreading Parallelization with XML Parsing
Simone Campanoni, Svilen Kanev, Kevin Brownell, Gu-Yeon Wei, David Brooks
Proc. International Workshop on Parallelism in Mobile Platforms (PRISM), June, 2014
HELIX-RC, a modern re-evaluation of the cyclic-multithreading (CMT) compiler technique, extracts threads from sequential code automatically. As a CMT approach, HELIX-RC gains performance by running iterations of the same loop on different cores in a multicore. It successfully boosts performance for several SPEC CINT benchmarks previously considered unparallelizable. However, this paper shows there are workloads with different characteristics, which even idealized CMT cannot parallelize. We identify how to overcome an inherent limitation of CMT for these workloads. CMT techniques only run iterations of a single loop in parallel at any given time. We propose exploiting parallelism not only within a single loop, but also among multiple loops. We call this execution model Multiple CMT (MCMT), and show that it is crucial for auto-parallelizing a broader class of workloads. To highlight the need for MCMT, we target a workload that is naturally hard for CMT -- parsing XML-structured data. We show that even idealized CMT fails on XML parsing. Instead, MCMT extracts speedups up to 3.9x on 4 cores.