Blocking Optimization for Sparse MTTKRP

Jee Whan Choi

Slides

Abstract

The MTTKRP kernel is the key performance bottleneck in sparse CP-ALS, where it can take up to 95% of the total execution time. We first use a simple roofline based performance model to demonstrate that the kernel is severely memory-bound even on systems with large cache, and use different blocking techniques to achieve significant speedup. In particular, our own rank blocking technique in combination with the traditional multi-dimensional blocking achieves high speedup on both shared memory and distributed settings on real-world datasets.

Date

Jan 26, 2019

9:30 AM — 10:20 AM

Event

Invited Workshop on Compiler Techniques for Sparse Tensor Algebra

Location

Cambridge, MA