Optimizing Tensor Decomposition on HPC Systems - Challenges and Approaches

Abstract

We discuss the challenges and approaches to optimizing various tensor decomposition algorithms on HPC systesm. We show that data locality is the dominant bottleneck - both in shared- and distributed-memory settings - and optimizing this leads to significant speedup over state-of-the-art.

Date
Event
Colloquium at Intel
Location
Santa Clara, CA