Published on: May 29, 2017
Which vehicle would you rather own: A car or a bus? If you typically only transport yourself and a few others, a car is the better choice—you can easily get everyone across town in one trip, and a car is less expensive to purchase and operate than a bus. But what if you had 100 people to transport? All of a sudden, making 20 additional trips across town becomes quite expensive and the bus becomes the most efficient cost-to-benefit option.
A similar analogy can be applied to DEM hardware selections: The kind of CPU and/or GPU devices that will work best for your simulations depends upon your problem size and your budget. And with Rocky 4 now supporting processing across 2 or more GPUs, the kind of hardware you choose is more important (and potentially confusing!) than ever.
In this post, we’ll take a closer look at the multi-GPU processing now available in Rocky 4, and will provide you with some guidance for making the best possible hardware decisions.
Rocky 4 Multi-GPU processing: how it works
Large-scale DEM simulations that have millions of particles use huge amounts of memory in the hardware. In addition, CPU memory can be quite expensive and simulation performance can vary quite drastically. A single CPU or GPU (Graphics Processing Unit) has a limited amount of memory and the particle count that can be handled is still restricted to this memory.
The multi-GPU solver in Rocky 4, however, overcomes this memory restriction by efficiently distributing and managing the combined memory of 2 or more graphic cards within a single motherboard. For example, a commercial scale high shear wet granulation device used in the pharmaceutical industry was modeled in Rocky 4 and by using multi-GPU solver technology, was able to simulate over 10 Million particles for the very first time (Figure 1). These kinds of very high particle counts were not possible previously but are now a reality thanks to the new multi-GPU capabilities found in Rocky 4.
Figure 1: Rocky DEM simulation snapshot of the granulator showing 10 Million particles colored by their translational velocity
How to choose the right hardware
Selecting the type of hardware you need depends largely upon the cost-to-benefit question. For example, a speed-up/cost comparison for various hardware is shown below for a commercial scale tablet coater simulated with a 242,000 tablets made of a 220-vertexed polyhedral meshed particle shape (Figure 2). The time taken for one second of the simulation is compared between various kinds of hardware. From this case, it is evident that the GPU gaming cards, like the GTX980s or Titan Z’s, offer more for their cost in comparison with the computing cards like the P100s. Even a single GPU card offers more value than an 8-core CPU processor.
Figure 2: Relative speed-up of GPU/CPU simulations and speed-up/price.
In addition, by comparing the scalability of the simulation with the number of GPUs, we observe very good scaling for large particle numbers in comparison with small particle counts (Figure 3). And while all cases are benefited from the addition of at least one GPU, the higher the number of particles simulated, the more benefit is seen from adding more than one GPU devices.
Figure 3: Speed-up per GPU amount X number of particles simulated for the commercial scale tablet coater benchmark
In summary, a good way to think of GPU vs. CPU performance is to think of the car vs. bus question: Just as a bus better transports many people across town, a 2 or more GPU (multi-GPU) solver better simulates many millions of particles and offers more value for money over a CPU alone. However, multi-GPU may provide little benefit for small problems (less than 1 million Particles) so a CPU alone could be the best bang for your buck in that case, just as a car is a better choice for transporting only a few people.
Try it out
If you are curious to know more about how your simulation would perform with our new Rocky 4 multi-GPU solver, contact us to perform a benchmark of your own.