HPC

Learning Efficient Sparse Encoding for High-Performance Tensor Computations

We present ReLATE, a deep reinforcement learning framework for automatically constructing an efficient sparse tensor format

Learning Efficient Sparse Encoding for High-Performance Tensor Computations

We present ReLATE, a deep reinforcement learning framework for automatically constructing an efficient sparse tensor format

Matrix-free Methods for Summation-by-Parts Finite Difference Operators on GPUs

We present a custom geometric multigrid preconditioned conjugate gradient method that applies summation-by-parts(SBP)-preserving interpolations and a custom matrix-free GPU kernel that achieves up to 5x speedup compared to solvers from PETSc and AmgX

Linearized Tensor Format for Performance-Portable Sparse Tensor Computation

We demonstrate the efficiency and performance-portability of encoding sparse tensors in a linearized format

Tensor Decomposition for Topic Modeling in AI

We discuss using high-performance tensor decomposition for topic modeling and malware detection

Optimizing Tensor Decomposition on HPC Systems - Challenges and Approaches

We discuss our experience in optimizing the CP and Tucker decomposition algorithms for sparse datasets on a distributed system.

Optimizing Tensor Decomposition on HPC Systems - Challenges and Approaches

We discuss our experience in optimizing the CP and Tucker decomposition algorithms for sparse datasets on a distributed system.

Tensor Decomposition for Malware Detection

We share our experience in using tensor decomposition for detecting malware

On optimizing distributed non-negative Tucker decomposition

We discuss our experience in optimizing the non-negative Tucker decomposition for sparse datasets on a distributed system.

Optimizing Tensor Decomposition on HPC Systems - Challenges and Approaches

We discuss our experience in optimizing the CP and Tucker decomposition algorithms for sparse datasets on a distributed system.