Optimizing Tensor Decomposition on HPC Systems

Abstract

We discuss the challenges and approaches to optimizing various tensor decomposition algorithms on HPC systesm. We show that data locality is the dominant bottleneck - both in shared- and distributed-memory settings - and optimizing this leads to significant speedup over state-of-the-art.

Date
Event
Colloquium
Location
Seoul, Korea