GPU Computing Gems Emerald Edition

GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing. This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use. Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website: …" - Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more - Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution - Offers insights and ideas as well as practical "hands-on" skills you can immediately put to use

1141903133

GPU Computing Gems Emerald Edition

74.95 In Stock

GPU Computing Gems Emerald Edition

Add to Wishlist

GPU Computing Gems Emerald Edition

eBook

$74.95

eBook
$74.95

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.

WANT A NOOK? Explore Now

Buy As Gift

Related collections and offers

Overview

Product Details

ISBN-13:	9780123849892
Publisher:	Morgan Kaufmann Publishers
Publication date:	01/13/2011
Series:	Applications of GPU Computing Series
Sold by:	Barnes & Noble
Format:	eBook
Pages:	886
File size:	21 MB
Note:	This product may take a few minutes to download.

About the Author

Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of Parallel Computing Institute and director of the IMPACT research group (www.impact.crhc.illinois.edu). He is a co-founder and CTO of MulticoreWare. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

Read an Excerpt

GPU Computing Gems Emerald Edition

By Wen-mei W. Hwu

Morgan Kaufmann

Copyright © 2011 NVIDIA Corporation and Wen-mei W. Hwu
All right reserved.
ISBN: 978-0-12-384989-2

Chapter One

GPU-Accelerated Computation and Interactive Display of Molecular Orbitals John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, Klaus Schulten

In this chapter, we present several graphics processing unit (GPU) algorithms for evaluating molecular orbitals on three-dimensional lattices, as is commonly used for molecular visualization. For each kernel, we describe necessary design trade-offs, applicability to various problem sizes, and performance on different generations of GPU hardware. We then demonstrate the appropriate and effective use of fast on-chip GPU memory subsystems for access to key data structures, show several GPU kernel optimization principles, and explore the application of advanced techniques such as dynamic kernel generation and just-in-time (JIT) kernel compilation techniques.

1.1 INTRODUCTION, PROBLEM STATEMENT, AND CONTEXT

The GPU kernels described here form the basis for the high-performance molecular orbital display algorithms in VMD, a popular molecular visualization and analysis tool. VMD (Visual Molecular Dynamics) is a software system designed for displaying, animating, and analyzing large biomolecular systems. More than 33,000 users have registered and downloaded the most recent VMD software, version 1.8.7. Due to its versatility and user-extensibility, VMD is also capable of displaying other large datasets, such as sequence data, results of quantum chemistry calculations, and volumetric data. While VMD is designed to run on a diverse range of hardware — laptops, desktops, clusters, and supercomputers — it is primarily used as a scientific workstation application for interactive 3-D visualization and analysis. For computations that run too long for interactive use, VMD can also be used in a batch mode to render movies for later use. A motivation for using GPU acceleration in VMD is to make slow batch-mode jobs fast enough for interactive use, thereby drastically improving the productivity of scientific investigations. With CUDA-enabled GPUs widely available in desktop PCs, such acceleration can have a broad impact on the VMD user community. To date, multiple aspects of VMD have been accelerated with the NVIDIA Compute Unified Device Architecture (CUDA), including electrostatic potential calculation, ion placement, molecular orbital calculation and display, and imaging of gas migration pathways in proteins.

Visualization of molecular orbitals (MOs) is a helpful step in analyzing the results of quantum chemistry calculations. The key challenge involved in the display of molecular orbitals is the rapid evaluation of these functions on a three-dimensional lattice; the resulting data can then be used for plotting isocontours or isosurfaces for visualization as shown in Fig. 1.1, and for other types of analyses. Most existing software packages that render MOs perform calculations on the CPU and have not been heavily optimized. Thus, they require runtimes of tens to hundreds of seconds depending on the complexity of the molecular system and spatial resolution of the MO discretization and subsequent surface plots.

With sufficient performance (two orders of magnitude faster than traditional CPU algorithms), a fast real-space lattice computation enables interactive display of even very large electronic structures and makes it possible to smoothly animate trajectories of orbital dynamics. Prior to the use of the GPU, this could be accomplished only through extensive batch-mode precalculation and preloading of timevarying lattice data into memory, making it impractical for everyday interactive visualization tasks. Efficient single-GPU algorithms are capable of evaluating molecular orbital lattices up to 186 times faster than a single CPU core (see Table 1.1), enabling MOs to be rapidly computed and animated on the fly for the first time. A multi-GPU version of our algorithm has been benchmarked at up to 419 times the performance of a single CPU core (see Table 1.2).

1.2 CORE METHOD

Since our target application is visualization focused, we are concerned with achieving interactive rendering performance while maintaining sufficient accuracy. The CUDA programming language enables GPU hardware features — inaccessible in existing programmable shading languages — to be exploited for higher performance, and it enables the use of multiple GPUs to accelerate computation further. Another advantage of using CUDA is that the results can be used for nonvisualization purposes.

Our approach combines several performance enhancement strategies. First, we use the host CPU to carefully organize input data and coefficients, eliminating redundancies and enforcing a sorted ordering that benefits subsequent GPU memory traversal patterns. The evaluation of molecular orbitals on a 3-D lattice is performed on one or more GPUs; the 3-D lattice is decomposed into 2-D planar slices, each of which is assigned to a GPU and computed. The workload is dynamically scheduled across the pool of GPUs to balance load on GPUs of varying capability. Depending on the specific attributes of the problem, one of three hand-coded GPU kernels is algorithmically selected to optimize performance. The three kernels are designed to use different combinations of GPU memory systems to yield peak memory bandwidth and arithmetic throughput depending on whether the input data can fit into constant memory, shared memory, or L1/L2 cache (in the case of recently released NVIDIA "Fermi" GPUs). One useful optimization involves the use of zero-copy memory access techniques based on the CUDA mapped host memory feature to eliminate latency associated with calls to cudaMemcpy(). Another optimization involves dynamically generating a problem-specific GPU kernel "on the fly" using justin-time (JIT) compilation techniques, thereby eliminating various sources of overhead that exist in the three general precoded kernels.

1.3 ALGORITHMS, IMPLEMENTATIONS, AND EVALUATIONS

A molecular orbital (MO) represents a statistical state in which an electron can be found in a molecule, where the MO's spatial distribution is correlated with the associated electron's probability density. Visualization of MOs is an important task for understanding the chemistry of molecular systems. MOs appeal to the chemist's intuition, and inspection of the MOs aids in explaining chemical reactivities. Some popular software tools with these capabilities include MacMolPlt, Molden, Molekel, and VMD.

The calculations required for visualizing MOs are computationally demanding, and existing quantum chemistry visualization programs are only fast enough to interactively compute MOs for only small molecules on a relatively coarse lattice. At the time of this writing, only VMD and MacMolPlt support multicore CPUs, and only VMD uses GPUs to accelerate MO computations. A great opportunity exists to improve upon the capabilities of existing tools in terms of interactivity, visual display quality, and scalability to larger and more complex molecular systems.

1.3.1 Mathematical Background

In this section we provide a short introduction to MOs, basis sets, and their underlying equations. Interested readers are directed to seek further details from computational chemistry texts and review articles. Quantum chemistry packages solve the electronic Schrödinger equation HΨ = EΨ or a given system. Molecular orbitals are the solutions produced by these packages. MOs are the eigenfunctions Ψ_v for expression of the molecular wavefunction Ψ, with H the Hamiltonian operator and E the system energy. The wavefunction determines molecular properties, for instance, the oneelectron density is ρ(r) = |Ψ(r)|. The visualization of the molecular orbitals resulting from quantum chemistry calculations requires evaluating the wavefunction on a 3-D lattice so that isovalue surfaces can be computed and displayed. With minor modifications, the algorithms and approaches we present for evaluating the wavefunction can be adapted to compute other molecular properties such as charge density, the molecular electrostatic potential, or multipole moments.

Each MO Ψ_v can be expressed as a linear combination over a set of K basis functions Φ_k,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.1)

where c_vk are coefficients contained in the quantum chemistry calculation output files, and used as input for our algorithms. The basis functions used by the vast majority of quantum chemical calculations are atom-centered functions that approximate the solution of the Schrödinger equation for a single hydrogen atom with one electron, so-called atomic orbitals. For increased computational efficiency, Gaussian type orbitals (GTOs) are used to model the basis functions, rather than the exact solutions for the hydrogen atom:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.2)

The exponential factor ζ is defined by the basis set; i, j, and k are used to modulate the functional shape; and N_ζijk is a normalization factor that follows from the basis set definition. The distance from a basis function's center (nucleus) to a point in space is represented by the vector R = {x, y, z} of length R = |R|.

The exponential term in Eq. 1.2 determines the radial decay of the function. Composite basis functions known as contracted GTOs (CGTOs) are composed of a linear combination of P individual GTO primitives in order to accurately describe the radial behavior of atomic orbitals.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.3)

The set of contraction coefficients {c_p} and associated exponents {ζ_p} defining the CGTO are contained in the quantum chemistry simulation output.

CGTOs are classified into different shells based on the sum l = i + j + k of the exponents of the x, y, and z factors. The shells are designated by letters s, p, d, f, and g for l = 0, 1, 2, 3, 4, respectively, where we explicitly list here the most common shell types but note that higher-numbered shells are occasionally used. The set of indices for a shell is also referred to as the angular momenta of that shell. We establish an alternative indexing of the angular momenta based on the shell number l and a systematic indexing m over the possible number of sums l = i + j + k, where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] counts the number of combinations and m = 0, ..., M_l - 1 references the set {(i, j, k): i + j + k = l}.

The linear combination defining the MO Ψ_v must also sum contributions from each of the N atoms of the molecule and the L_n shells of each atom n. The entire expression, now described in terms of the data output from a QM package, for an MO wavefunction evaluated at a point r in space then becomes

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.4)

where we have replaced c_vk by c_vnlm, with the vectors R_n = r - r_n connecting the position r_n of the nucleus of atom n to the desired spatial coordinate r. We have dropped the subscript p from the set of contraction coefficients {c} and exponents {ζ} with the understanding that each CGTO requires an additional summation over the primitives, as expressed in Eq. 1.3.

The normalization factor N_ζijk in Eq. 1.2 can be factored into a first part η_ζl that depends on both the exponent ζ and shell type l = i + j + k and a second part η_ijk (=η_lm in terms of our alternative indexing) that depends only on the angular momentum,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.5)

The separation of the normalization factor in Eq. 1.5 allows us to factor the summation over the primitives from the summation over the array of wavefunction coefficients. Combining Eqs. 1.2–1.4 and rearranging terms gives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.6)

(Continues...)

Excerpted from GPU Computing Gems Emerald Edition by Wen-mei W. Hwu Copyright © 2011 by NVIDIA Corporation and Wen-mei W. Hwu . Excerpted by permission of Morgan Kaufmann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

1. Scientific Simulation 2. Life Sciences 3. Statistical Modeling 4. Emerging Data-Intensive Applications 5. Electronic Design Automation 6. Ray Tracing and Rendering 7. Computer Vision 8. Video and Image Processing 9. Signal and Audio Processing 10. Medical Imaging

What People are Saying About This

From the Publisher

Practical parallel computing techniques to enhance scientific research, straight from the leading minds in GPGPU

From the B&N Reads Blog

Page 1 of

Editorial Reviews

Praise for GPU Computing Gems: Emerald Edition:
"GPU computing is becoming an outstanding field in high performance computing. Due to its easiness, the CUDA approach enables programmers to take advantage of GPU-acceleration very quickly… My research in complex science as well as applications in high frequency trading benefited significantly from GPU computing." —Dr. Tobias Preis, ETH Zurich, Switzerland

"This book is an important reference for everyone working on GPU/CUDA, and contains definitive work in a selection of fields. The patterns of CUDA parallelization it describes can often be adapted to applications in other fields." —Dr. Ming Ouyang, Assistant Professor – Director Visualization and Intensive Graphics Lab, University of Louisville

"Diving into the world of GPU computing has never been more important these days. GPU Computing Gems: Emerald Edition takes you through the looking glass into this fascinating world." —Martin Eisemann, Computer Graphics Lab, TU Braunschweig

"…an outstanding collection of vignettes of how to program GPUs for a breathtaking range of applications." —Dr. Amitabh Varshney, Director, Institute for Advanced Computer Studies, University of Maryland

"The book features a useful index that might help readers mine the gems in search of a solution to a specific algorithmic problem. The index is accompanied by online resources containing source code samples—and further information—for some of the chapters. A second volume with another 30 chapters of GPGPU application reports, somewhat more focused on generic algorithms and programming techniques, is currently in the pipeline and scheduled to appear as the "Jade Edition" sometime this month." —Computing in Science and Engineering

"The book is an excellent selection of important papers describing various applications of GPUs. As such, I believe it would be a valuable addition to the bookshelf of any researcher in modeling and simulation…This is not a substitute for a more detailed text on massively parallel programming...Instead, it is a nice practical addition to that text." —Computing Reviews, August 2012

"...the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk." -Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010

From the Publisher

GPU Computing Gems Emerald Edition

GPU Computing Gems Emerald Edition

eBook

eBook

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

GPU Computing Gems Emerald Edition

Morgan Kaufmann

Chapter One

Table of Contents

What People are Saying About This

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

GPU Computing Gems Emerald Edition

Morgan Kaufmann

Chapter One

Table of Contents

What People are Saying About This

Related Subjects

Customer Reviews