Showing posts with label OpenACC. Show all posts
Showing posts with label OpenACC. Show all posts



In a chat recently, I heard that computational fluid dynamics (CFD) can’t take advantage of GPUs. That seemed a bit doubtful to me, so I looked it up. Seems like there has been some work recently that showed how use of GPUs greatly accelerate CFD workloads.

This press release on OpenACC’s website talks about how a private company (AeroDynamic Solutions, Inc. (ADSCFD)) used OpenACC to give their proprietary CFD solver Code LEO GPU capabilities, with very good speedup.

By using OpenACC to GPU-accelerate their commercial flow solver, ADSCFD achieved significant value. They realized dramatically improved performance across multiple use cases with speed-ups ranging from 20 to 300 times, reductions in cost to solution of up to 70%, and access to analyses that were once deemed infeasible to instead being achieved within a typical design cycle.

Similar blog posts from Nvidia and ANSYS+Nvidia last year also show significant speedups (between 12x and 33x) and significant power consumption savings, as well.

Nvidia’s blog post show results from a beta version of ANSYS Fluent and Simcenter STAR-CCM+. 

Figure 2 shows the performance of the first release of Simcenter STAR-CCM+ 2022.1 against commonly available CPU-only servers. For the tested benchmark, an NVIDIA GPU-equipped server delivers results almost 20x faster than over 100 cores of CPU.

The performance of the Ansys Fluent 2022 beta1 server compared to CPU-only servers shows that Intel Xeon, AMD Rome, and AMD Milan had ~1.1x speedups compared to the NVIDIA A100 PCIe 80GB, which had speedups from 5.2x (one GPU) to an impressive 33x (eight GPUs). 

ANSYS’s blog post covers the same result as Nvidia, showing 33x speedup using 8 A100 GPUs. They also do a cost comparison of equal-speed clusters, one using GPUs and the other purely CPUs:

1 NVIDIA A100 GPU ≈ 272 Intel® Xeon® Gold 6242 Cores

Comparing the older V100 GPUs with Intel® Xeon® Gold 6242, the 6x V100 GPU cluster would cost $71,250 while the equivalent CPU-only cluster would cost $500,000, i.e. about one seventh the price.