Linear Algebra Accelerators for Low-Power, High-Performance Multi-Core Computing
With semiconductor technology scaling reaching physical limits, overcoming power limitations is one of the major issues on the path to increased performance. It is well-accepted that specialization and heterogeneity at the hardware level can be keys to achieving orders of magnitude improvements in both power consumption and performance. However, full-custom hardware design is expensive in many ways. The question is whether multi-core processors can be designed that achieve the efficiency of custom hardware with enough flexibility to run a broad class of applications.
WNCG Prof. Andreas Gerstlauer and students, in collaboration with UT Austin computer science Prof. Robert A. Van de Geijn are studying these questions for several domains, including linear algebra computations, which are at the core of many high-performance as well as embedded, signal processing or big data applications. By co-designing algorithms and architectures for a dedicated Linear Algebra Processor (LAP), the team's previous results have shown that a prototypical LAP in 45nm is expected to maintain 600 double-precision GFLOPS in less than 25W with enough flexibility to support the full range of basic linear algebra subroutines (BLAS). This is orders of magnitude more energy efficient (as measured in energy per operation) than existing CPUs or GPUs.
In recent work, the UT Austin research team have been able to show that, with minimal modifications to the LAP base architecture, similar efficiencies are achievable across a wider range of applications, including complete matrix factorizations as well as Fast Fourier Transforms (FFTs). On-going work is concerned with investigating system integration of one or more LAPs into larger, heterogeneous multi-core host architectures, including associated programming models as well as optimized mapping and compilation of parallelized applications onto such platforms. Furthermore, the researchers are investigating LAP implementation and prototyping onto FPGAs or ASICs.
This research is funded by the National Science Foundation.
Paper 1: A Highly Efficient Multi-Core Floating FFT Architecture Based on Hybrid Linear Algebra/FFT Cores
Paper 2: Algorithm, Architecture and Floating-Point Unit Codesign of a Matrix Factorization Accelerator
Paper 3: Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures