is available for shared memory computers and computer clusters for the element
operations, direct sparse solver, and iterative linear equation solver; and
can use compute-capable GPGPU hardware on shared memory
computers for the direct sparse solver, AMS eigensolver,
and modal frequency response solver.
Abaqus/Standard supports both shared memory computers and computer clusters for parallelization.
Parallelization is invoked using the cpus option in the
abaqus execution procedure. The type of parallelization that is
executed relies on the computer resource configured by the job submission system. The
configured computer resource is reflected by means of the environment variable
mp_host_list (see Environment File Settings). If mp_host_list consists of a single machine host,
thread-based parallelization is used within that host if it has more than one processor
available. If mp_host_list consists of multiple hosts,
MPI-based parallelization is executed. In addition, if each host has more than one
processor, thread-based parallelization is executed on each host. This type of
parallelization is defined as a hybrid parallelization of MPI and threads.
Thread-Based Parallelization
You can execute Abaqus/Standard in thread mode within one node of a compute cluster. This approach takes advantage of the
shared memory available to the threads that are running on different processors.
In most cases, Abaqus/Standard provides full support for thread-based parallelization, but this parallelization
method is not supported for the previous implementation of the linear dynamic analysis
procedures started using the parameter setting
SIM=NO
for the eigenvalue extraction analysis procedure. Using this branch of the code is required
in several cases where the new high-performance implementation of the linear dynamic
analysis procedures is not available.
Two thread-based parallelization options are available: the
hybrid direct sparse solver and the pure thread-based direct sparse solver. The pure
thread-based direct sparse solver provides better performance than the hybrid direct sparse
solver for many workflows, particularly for models dominated by structural elements. All
models that you can run in the memory of a single compute node should benefit. The pure
thread-based direct sparse solver is the default option. However, there are a number of
cases for which the pure thread-based direct sparse solver is not available. Abaqus/Standard detects these cases automatically and runs thread-based parallelization using the hybrid
direct sparse solver instead. For more information, see Limitations.
Limitations
Abaqus/Standard reverts to the hybrid direct sparse solver implementation if your analysis includes any
of the following cases (for which the pure thread-based direct sparse solver is not
available):
Transient sensitivity analysis STEP,
SENSITIVITY=ADJOINT;
and DYNAMIC.
GPGPU acceleration.
MPI-based parallelization.
The inability for all solver data to remain in processor memory.
MPI-Based Parallelization
Abaqus/Standard can also be executed in MPI mode, which uses the message passing interface to communicate
between machine hosts. In most cases MPI-based parallelization is fully supported in Abaqus/Standard, except in the following workflows and features:
Cavity radiation analyses where parallel decomposition of the cavity is not allowed
and writing or restart data is requested (Cavity Radiation in Abaqus/Standard).
Heat transfer analyses where average-temperature radiation conditions are specified
(Thermal Loads).
You can further improve performance in hybrid mode when MPI parallelization is executed
between hosts and thread-based parallelization is executed within each host.
The threads_per_mpi_process option can be used with
the cpus option to reconfigure the parallelization. The
number of threads_per_mpi_process should be a divisor
of the number of processors on each host and eventually a divisor of the number of cpus if
the number of processors on all hosts are the same.
Parallel Execution of Steady-State Dynamic Analyses
Steady-state dynamic analysis provides the steady-state response (also called frequency
response) of a system due to harmonic excitation at a given frequency. Usually such analysis
is done as a frequency sweep by applying the loading at a series of different frequencies
and calculating the response. Abaqus/Standard supports three types of steady-state dynamic analysis:
Calculation of the frequency response for large finite element systems can be time
consuming. Parallel execution is a practical option for steady-state dynamic analyses with
millions of degrees of freedom, thousands of frequencies, and hundreds of load cases.
In mode-based steady-state dynamic analysis and subspace-based steady-state dynamic
analysis, the response calculation is based on solution of the steady-state dynamic
equations projected onto a subspace of modes. The modes of the undamped system must first be
extracted using the eigenfrequency extraction procedure. The number of modes extracted must
be sufficient to model the dynamic response of the system adequately, which is a matter of
judgment on your part. In the general case of a finite element model with damping, the
projected dynamic system of equations has number-of-modes
equations and a dense complex matrix.
Parallel Execution of Direct-Solution Steady-State Dynamic Analyses
Direct-solution steady-state dynamic analysis is the most accurate and the most
computationally intensive of the steady-state dynamic analysis types. In a direct-solution
steady-state analysis, the frequency response is calculated directly in terms of the
physical degrees of freedom of the model using the direct sparse solver.
For large-scale direct-solution steady-state dynamic analyses, we recommend execution on
computer clusters using hybrid MPI-based and thread-based
parallelization and GPGPU acceleration, if the graphic
processing units are available. It is recommended to use 1
GPGPU per MPI domain;
specifying more GPGPUs per
MPI domain may not scale well.
Example: Parallel Execution of Direct-Solution Steady-State Dynamic Analyses
In this example, you execute a direct-solution steady-state dynamic analysis with 100
frequencies on a cluster of machine hosts with physical cores. You split the frequencies into three partitions:
33-33-34. You execute the analysis for each partition on two cluster nodes (with a total
of physical cores) using 8 MPI domains (10
physical cores per MPI domain). The combined output
results are obtained in job1.odb.
Parallel Execution of Mode-Based Steady-State Dynamic Analyses
In a mode-based steady-state dynamic analysis, the steady-state dynamic modal operator is
combined from the modal stiffness, mass, and damping operators precalculated in the
eigenfrequency extraction procedure. Solving the system of equations with respect to the
generalized displacements can be costly for mode-based steady-state dynamic analysis with
many thousands of modes, thousands of frequencies, and hundreds of load cases.
Abaqus offers a scalable high-performance hybrid
CPU-GPU modal steady-state dynamics solver. Mode-based
steady-state dynamic analysis cannot be executed in MPI
mode. It can be executed in thread mode only within one node of a compute cluster, and it
can take advantage of GPGPU acceleration, if the graphic
processing units are available. To achieve maximal computational efficiency of a
mode-based steady-state dynamic analysis, run it on a dedicated computer with sufficient
memory and use all available physical cores and GPGPU
devices.
Example: Parallel Execution of Mode-Based Steady-State Dynamic Analyses
In this example you execute a mode-based steady-state dynamic analysis with 1000
frequencies on a cluster of machine hosts with physical cores and 2 GPGPU devices. You
split the frequencies into 10 partitions of 100 frequencies each. The combined output
results are obtained in job1.odb.
You run multiple steady-state dynamic analyses with a single step restart from the same
eigenfrequency extraction analysis job0 that is executed at a
node of the same cluster. The combined output results are obtained in
job1.odb.
Parallel Execution of Subspace-Based Steady-State Dynamic Analyses
A subspace-based steady-state dynamic analysis provides an approximate way to include
frequency-dependent effects (such as frequency-dependent damping and viscoelastic effects)
or nonsymmetric stiffness in the model. This solution technique is less accurate than
direct-solution steady-state analysis, in particular if significant material damping or
viscoelasticity with a high loss modulus is present, but it is computationally cheaper
than direct-solution steady-state dynamics. The projection of the dynamic equilibrium
equations onto a subspace of selected modes is performed at every frequency, which makes
subspace-based steady-state dynamic analysis more expensive than mode-based steady-state
dynamics. Only thread-based parallelization is available for subspace-based steady-state
dynamic analysis including the operator projection and modal solution phases.
Splitting Frequency Points Technique for Running Steady-State Dynamic
Analyses
When sufficient computer resources are available, you can significantly reduce the time
required to run steady-state dynamic analyses with many frequencies by splitting the
frequency points into several partitions and executing the analysis for each partition
independently. No changes in the job input file are required.
To execute a steady-state dynamic analysis for a partition, you must specify the total
number of partitions and the current partition number. If you execute analyses for
multiple partitions simultaneously on a computer cluster, it can be significantly faster
than executing a single analysis for all frequencies.
Splitting Frequency Points with Restart
You can use the frequency splitting technique only for a single-step analysis. If the
analysis includes preloading steps or it includes an eigenfrequency extraction step
calculating the modal subspace, you can use the restart capability to run a steady-state
dynamic analysis with frequency splitting.
Combining Results for Frequency Partitions
After the jobs for all partitions complete, you can combine the results into a single
output database (.odb) file. You can combine only output results from
analyses performed with the
resultsformat=odb
option.
Consistency of Results
Some physical systems (systems that, for example, undergo buckling, material failure, or
delamination) can be highly sensitive to small perturbations. For example, it is well known
that the experimentally measured buckling loads and final configurations of a set of
seemingly identical cylindrical shells can show significant scatter because of small
differences in features such as boundary conditions, loads, and initial geometries. When
simulating such systems, the physical sensitivities seen in an experiment can be manifested
as sensitivities to small numerical differences caused by finite precision effects. Finite
precision effects can lead to small numerical differences when running jobs on different
numbers of processors. Therefore, when simulating physically sensitive systems, you might
see differences in the numerical results (reflecting the differences seen in experiments)
between jobs run on different numbers of processors. To obtain consistent simulation results
from run to run, the number of processors should be constant.