High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach

Publication
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for