This week’s assignment is to review one of the papers that our classmates chose earlier in the semester. I have chosen to take a look at General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units [doi: 10.1016/j.jcp.2008.01.047. Yong Hwan Kim originally chose to discuss this paper for the fourth weekly assignment on October 1st.

Parallelizing molecular dynamics is trivial since the force calculations are independant for each atom and type of force. The challenge is in passing data. This is where the gpu can come in. Large amounts of short floating point operations are the perfect candidate for GPU processing.

Previously the authors had tried to implement parallelized versions on clusters. They found that the communication overhead dominated the performance for large systems. The authors show that the gpu implementation running on one card can equal the throughput of a small to medium sized cluster.

One challenge with the GPU implementation is determining the correct size for the processing chuncks. The smallest processing unit requires 32 threads, each executing the same instruction sequence. Another challenge is memory access. Since the GPU has a much smaller amount of memory than the system itself, access latency can dominate the simulation’s performance.

This paper was originally published in May 2008. In the past 4.5 years many new technologies and software platforms have become available. It would be interesting to see if the relative performance of GPUs and CPUs has decreased, improved, or remained the same.