Customer Portal

More GPUs = faster processing with Rocky

Updated on: March 3, 2023

In the past, DEM simulations were restricted to relatively small problems that used, for example, only thousands of larger particles that were mostly spherical in shape.

Continual improvements in both DEM codes and computational power have enabled closer-to-reality particle simulations. Users today can expect to simulate problems using the real particle shape and the actual particle size distribution (PSD), creating DEM simulations with many millions of particles.

However, these enhancements in simulation accuracy have come at the cost of increased computational loads in both processing time and memory requirements. Within Rocky, these loads can be offset considerably by using GPU processing abilities, which provides users with the capacity to obtain results in a more practical time frame.

The benefits of GPU

The addition of GPU processing has helped make DEM a practical tool for engineering design. For example, the speed-up experienced by processing a simulation with even an inexpensive gaming GPU is remarkable when compared to a standard 8-core CPU machine working alone.

Since release 4 of Rocky, users have been able to make use of multi-GPU technology capabilities, which facilitates large-scale and/or complicated solutions that were previously impossible to tackle due to memory limitations. By combining the memory of multiple GPU cards at once, users have been able to overcome these limitations and achieve a substantial performance increase by aggregating their computing power.

From an investment perspective, there are many benefits to multi-GPU processing. The hardware cost of running cases with several millions of particles using multiple GPUs is much smaller than buying an equivalent CPU-based machine. The energy consumption is also less with GPUs, and GPU-based machines are also easier to upgrade by adding more cards or buying newer ones.

Moreover, in a world where we push multi-physics simulations ever farther, Rocky GPU and multi-GPU processing enables you to free-up all your CPUs for coupled simulations, avoiding hardware competition.

Performance benchmark

To better illustrate the gains in processing speed that are possible for common applications, a performance benchmark of a rotating drum (Figure 1) was developed. Multiple runs using different criteria were evaluated as explained below.

Rotating drum benchmark

Figure 1 – Rotating drum benchmark case for spheres (left) and polyhedrons (right).

Criteria 1: Particle shape

Two different particle shapes were evaluated at the same equivalent size (Figure 2):

  • Spheres
  • Polyhedrons (shaped from 16 triangles)

Drum geometry was lengthened as the number of particles increased to keep the material cross section consistent across the various runs.

particle shapes

Figure 2 – Sphere (left) and 16-triangle polyhedron (right) particle shapes used in the benchmark case.

Criteria 2: Processing type

Four different processing combinations were evaluated:

  • CPU: Intel Xeon Gold 6230 @ 2.10GHz on 8 cores
  • 1 GPU: NVIDIA Titan V
  • 2 GPUs: NVIDIA Titan V
  • 4 GPUs: NVIDIA Titan V

Criteria 3: Performance measurement

Two measurements were taken at steady state to evaluate performance:

  • Simulation Pace, which is the amount of hardware processing time (duration) required to advance the simulation one second. In general, a lower simulation pace indicates faster processing.
  • GPU Memory Usage, which is the amount of memory being used on the GPU while processing the simulation. In general, a lower memory usage allows for more particles to be processed, and/or more calculations to be performed.

Check out this infographic to see how the GPU and multi-GPU processing capabilities available in Rocky can help you speed up your particle simulations regardless of the size of your business.

Benchmark results for Rocky 2023 R1

Figure 3 shows the 2023 R1 simulation pace for both particle shapes. Note that there is a secondary axis to present the values for CPU cases due to the high values compared with GPU cases. The pace for CPU shows a linear aspect for all cases and this behavior is different for GPU cases, where there is a transition until GPU presents a linear region. It is important to be aware of this region so that you select the best GPU resources for each case.

Simulation Pace obtained using spheres (left) and polyhedrons (right)

Figure 3 – Simulation Pace obtained using spheres (left) and polyhedrons (right).


Relevant conclusions on simulation performance (Figure 4):

GPU speed up based upon Simulation Pace (compared with CPU – 8 cores) achieved using spheres (left) and polyhedrons (right)

Figure 4 – GPU speed up based upon Simulation Pace (compared with CPU – 8 cores) achieved using spheres (left) and polyhedrons (right).

  • Results show a significant performance gain with multi-GPU versus CPU simulations: up to 85 times faster for polyhedrons and 50 times faster for spheres when comparing 4 GPUs with an 8-core CPU.
  • GPU maximum gain is achieved when using approximately 250K or more particles per GPU for both sphere and polyhedron shape types.
  • Scalability is preserved when the number of particles is increased.

The following plots show performance improvement for spheres and polyhedrons for different numbers of particles using different numbers of GPUs (1x, 2x, and 4x).

For speed tests and hardware recommendations, see Rocky with Multi-GPU: Which Hardware is Best for You?

Relevant conclusions on GPU memory consumption (Figure 5):

GPU memory consumption using spheres (left) and polyhedrons (right)

Figure 5 – GPU memory consumption using spheres (left) and polyhedrons (right).

  • Memory consumption per million particles is approximately 2GB for spheres and 3.5GB for polyhedrons.
    Note: This ratio is just a general guideline and can vary with case behavior, setup, and enabled models.
  • There is an initial memory consumption of about 700MB per GPU for general kernel allocation and simulation management. This is on the top of the aforementioned particle memory allocation.

See also: GPU Buying Guide and FAQs

Lucilla Almeida

CAE Specialist at ESSS, D.Sc.

Lucilla holds a BE (Chemical) undergraduate degree, an M.Sc. in Chemical Engineering and a Ph.D. in Nuclear engineering from the Federal University of Rio de Janeiro. She joined ESSS in 2008 and has spent 5 years focused on applying CFD to solve common engineering problems in the Oil and Gas industry, dealing with turbulent and multiphase flow simulations. Since 2013, she is an Application Engineer for Rocky DEM Business Unit, supporting users, working on consultancy projects and validating models implemented for the CFD-DEM coupling.

Related posts

Leave a comment

Get Fresh Updates on Email

Subscribe to our newsletter

    I agree to receive communications.

    By providing my data, I agree to the  Privacy and Processing of Personal Data Terms.