Jee Whan Choi
Richard Vuduc
Model-Driven Sparse CP Decomposition for High-Order Tensors
Analyzing the Energy Efficiency of the Fast Multipole Method Using a DVFS-Aware Energy Model
A CPU:GPU hybrid implementation and model-driven scheduling of the fast multipole method
Algorithmic time, energy, and power on candidate HPC compute building blocks
How much (execution) time and energy does my algorithm cost?
A brief history and introduction to GPGPU
A roofline model of energy
A roofline model of energy
Performance analysis and tuning for general purpose graphics processing units (GPGPU)
Towards a communication optimal fast multipole method and its implications for exascale
Courses in high-performance computing for scientists and engineers
Modeling and Analysis for Performance and Power
Model-driven autotuning of sparse matrix-vector multiply on GPUs
On the limits of GPU acceleration
Sparse matrix vector multiplication on multicore and accelerator systems