Scalable Energy-Efficient Parallel Sorting on a Fine-Grained Many-Core Processor Array
Aaron Stillmaker *†
Brent Bohnenstiehl *
Lucas Stillmaker *
Bevan M. Baas *
†Department of Electrical and Computer Engineering
California State University, Fresno
*VLSI Computation Laboratory
Department of Electrical and Computer Engineering
University of California, Davis
Three parallel sorting applications and two list output protocols for the first phase of an external sort execute on a fine-grained many-core processor array that contains no algorithm-specific hardware acting as a co-processor with a variety of array sizes. Results are generated using a cycle-accurate model based on measured data from a fabricated many-core chip, and simulated for different processor array sizes. The data shows most energy efficient first-phase many-core sort requires over 65× lower energy than GNU C++ standard library sort performed on an Intel laptop-class processor and over 105× lower energy than a radix sort running on an Nvidia GPU. In addition, the highest first-phase throughput many-core sort is over 9.8× faster than the std::sort and over 14× faster than the radix sort. Both phases of a 10 GB external sort require 6.2× lower energy×time energy delay product than the std::sort and over 13× lower energy×time than the radix sort.
PDF (1.0 MB), (c) Copyright 2020, Elsevier.
Aaron Stillmaker, Brent Bohnenstiehl, Lucas Stillmaker, and Bevan Baas, "Scalable energy-efficient parallel sorting on a fine-grained many-core processor array," Journal of Parallel and Distributed Systems, vol. 138, pp. 32-47, April 2020.
title = "Scalable energy-efficient parallel sorting on a fine-grained many-core processor array",
author = "Aaron Stillmaker and Brent Bohnenstiehl and Lucas Stillmaker and Bevan Baas",
journal = "Journal of Parallel and Distributed Computing",
volume = "138",
pages = "32 - 47",
year = "2020",
doi = "https://doi.org/10.1016/j.jpdc.2019.12.011"