Publications

(2023). ×Grid: A Location-oriented Topology Design for LEO Satellites. LEO-NET’23: Proceedings of the 1st ACM Workshop on LEO Networking and Communication.

PDF DOI

(2023). Power Constrained Autotuning using Graph Neural Networks. 37th IEEE International Parallel and Distributed Processing Symposium (IPDPS’23).

PDF DOI

(2021). High Performance Streaming Tensor Decomposition. 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS’21).

PDF DOI

(2018). High-performance Dense Tucker Decomposition on GPU Clusters. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18).

PDF DOI

(2018). Blocking Optimization Techniques for Sparse Tensor Computation. 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’18).

PDF DOI

(2017). Model-Driven Sparse CP Decomposition for High-Order Tensors. 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS’17).

PDF DOI

(2016). Analyzing the Energy Efficiency of the Fast Multipole Method Using a DVFS-Aware Energy Model. 30th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

PDF DOI

(2014). Algorithmic time, energy, and power on candidate HPC compute building blocks. 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS’14).

PDF DOI

(2013). A roofline model of energy. 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS’13).

PDF DOI

(2013). A brief history and introduction to GPGPU. Modern Accelerator Technologies for Geographic Information Science.

PDF DOI

(2012). Modeling and Analysis for Performance and Power. IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW).

PDF DOI

(2012). Courses in high-performance computing for scientists and engineers. IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW).

PDF DOI

(2010). On the limits of GPU acceleration. Proceedings of the 2nd USENIX inproceedings on Hot topics in parallelism.

PDF

(2010). Model-driven autotuning of sparse matrix-vector multiply on GPUs. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’11).

PDF DOI

(2008). Bypassing BigBackground: An efficient hybrid background modeling algorithm for embedded video surveillance. Second ACM/IEEE International Conference on Distributed Smart Cameras, 2008 (ICDSC 2008).

PDF DOI