Parallel Execution in Abaqus/Standard

Invoking Parallel Processing

Abaqus/Standard supports both shared memory computers and computer clusters for parallelization. Parallelization is invoked using the cpus option in the abaqus execution procedure. The type of parallelization that is executed relies on the computer resource configured by the job submission system. The configured computer resource is reflected by means of the environment variable mp_host_list (see Environment File Settings). If mp_host_list consists of a single machine host, thread-based parallelization is used within that host if it has more than one processor available. If mp_host_list consists of multiple hosts, MPI-based parallelization is executed. In addition, if each host has more than one processor, thread-based parallelization is executed on each host. This type of parallelization is defined as a hybrid parallelization of MPI and threads.

Thread-Based Parallelization

You can execute Abaqus/Standard in thread mode within one node of a compute cluster. This approach takes advantage of the shared memory available to the threads that are running on different processors.

In most cases, Abaqus/Standard provides full support for thread-based parallelization, but this parallelization method is not supported for the previous implementation of the linear dynamic analysis procedures started using the parameter setting SIM=NO for the eigenvalue extraction analysis procedure. Using this branch of the code is required in several cases where the new high-performance implementation of the linear dynamic analysis procedures is not available.

Two thread-based parallelization options are available: the hybrid direct sparse solver and the pure thread-based direct sparse solver. The pure thread-based direct sparse solver provides better performance than the hybrid direct sparse solver for many workflows, particularly for models dominated by structural elements. All models that you can run in the memory of a single compute node should benefit. The pure thread-based direct sparse solver is the default option. However, there are a number of cases for which the pure thread-based direct sparse solver is not available. Abaqus/Standard detects these cases automatically and runs thread-based parallelization using the hybrid direct sparse solver instead. For more information, see Limitations.

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n

If mp_host_list is configured as shown below, the host list consists of only one host (maple), which has n processors to use. The job runs thread-based parallelization on this host. The default approach is to run the pure thread-based solver. To use the hybrid direct sparse solver, use direct_solver=DMP

mp_host_list=['maple', n]

Abaqus/CAE Usage

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n; Multiprocessing mode: Threads

Limitations

Abaqus/Standard reverts to the hybrid direct sparse solver implementation if your analysis includes any of the following cases (for which the pure thread-based direct sparse solver is not available):

Eigensolver buckling prediction (Eigenvalue Buckling Prediction).
Natural frequency extraction using EIGENSOLVER=AMS or EIGENSOLVER=SUBSPACE options (Natural Frequency Extraction).
Substructure generation (Generating Substructures).
Matrix generation (Generating Matrices as a Linear Analysis Step).
Element matrix output requests (Element Matrix Output in Abaqus/Standard).
Features activating SOLVER CONTROLS, CONSTRAINT OPTIMIZATION; INERTIA RELIEF; COUPLING; and DISTRIBUTING; or any of these element types: C3D4H, C3D4PH, C3D4PHT, C3D10HS.
Procedure types SOILS, MAGNETOSTATIC, and DIRECT CYCLIC.
CO-SIMULATION.
LOAD CASE when boundary conditions vary between load cases.
SOLUTION TECHNIQUE, TYPE=LCP CONTACT.
The combination of STEADY STATE DYNAMICS, DIRECT and STEP, UNSYMM=YES
Transient sensitivity analysis STEP, SENSITIVITY=ADJOINT; and DYNAMIC.
GPGPU acceleration.
MPI-based parallelization.
The inability for all solver data to remain in processor memory.

MPI-Based Parallelization

Abaqus/Standard can also be executed in MPI mode, which uses the message passing interface to communicate between machine hosts. In most cases MPI-based parallelization is fully supported in Abaqus/Standard, except in the following workflows and features:

Quasi-Newton nonlinear solution technique (Solving Nonlinear Problems).
Eigensolver buckling prediction (Eigenvalue Buckling Prediction).
Natural frequency extraction (Natural Frequency Extraction).
Response spectrum analysis (Response Spectrum Analysis).
Random response analysis (Random Response Analysis).
Mode-based linear dynamic analysis (Transient Modal Dynamic Analysis, Mode-Based Steady-State Dynamic Analysis, Subspace-Based Steady-State Dynamic Analysis, and Complex Eigenvalue Extraction).
Cavity radiation analyses where parallel decomposition of the cavity is not allowed and writing or restart data is requested (Cavity Radiation in Abaqus/Standard).
Heat transfer analyses where average-temperature radiation conditions are specified (Thermal Loads).
Substructure generation (Generating Substructures).
Matrix generation (Generating Matrices as a Linear Analysis Step).
Element matrix output requests (Element Matrix Output in Abaqus/Standard).
Continuation of output on restart (Continuation of Output on Restart).
Adaptive meshing (Defining ALE Adaptive Mesh Domains in Abaqus/Standard).

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n

If n=4 and mp_host_list is configured as shown below, the host list consists of four hosts (maple, pine, oak, and elm), and each host has only one processor to use. The job runs MPI-based parallelization between these hosts.

mp_host_list=[['maple', 1], ['pine', 1], ['oak', 1], ['elm', 1]]

Abaqus/CAE Usage

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n; Multiprocessing mode: MPI

Hybrid Parallelization of MPI and Threads

You can further improve performance in hybrid mode when MPI parallelization is executed between hosts and thread-based parallelization is executed within each host.

The threads_per_mpi_process option can be used with the cpus option to reconfigure the parallelization. The number of threads_per_mpi_process should be a divisor of the number of processors on each host and eventually a divisor of the number of cpus if the number of processors on all hosts are the same.

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n

If n=4 and mp_host_list is configured as shown below, the host list consists of two hosts (maple and pine), and each host has two processors to use. The job runs MPI-based parallelization between maple and pine and also runs two thread parallelization separately on both hosts.

mp_host_list=[['maple', 2], ['pine', 2]]

Enter the following input on the command line:

abaqus job=job-name cpus=n threads_per_mpi_process=m

If n=4, m=2, and mp_host_list is originally configured as follows:

mp_host_list=['maple', 4]

threads_per_mpi_process splits the original single host (maple) into two hosts, and each host has two processors to use as follows:

mp_host_list=[['maple', 2], ['maple', 2]]

Then the job runs MPI-based parallelization between the first host (maple) and second host (maple) and also runs two thread parallelization separately on both hosts.

If mp_host_list is originally configured as follows:

mp_host_list=[['maple', 2], ['pine', 2]]

threads_per_mpi_process splits each host into two hosts that results in four hosts as follows:

mp_host_list=[['maple', 1], ['maple', 1], ['pine', 1], ['pine', 1]]

The job runs only MPI-based parallelization between the hosts in the host list.

Abaqus/CAE Usage

Hybrid execution is not supported in Abaqus/CAE.

GPGPU Acceleration

GPGPU acceleration is supported in the direct sparse solver (Direct Linear Equation Solver), AMS eigensolver (Selecting the Eigenvalue Extraction Method), and modal frequency response solver (Mode-Based Steady-State Dynamic Analysis).

Input File Usage

Enter the following input on the command line:

abaqus job=job-name gpus=n

Abaqus/CAE Usage

Job module: job editor: Parallelization: toggle on Use GPGPU acceleration, and specify the number of graphic processing units, n

Parallel Execution of Steady-State Dynamic Analyses

Steady-state dynamic analysis provides the steady-state response (also called frequency response) of a system due to harmonic excitation at a given frequency. Usually such analysis is done as a frequency sweep by applying the loading at a series of different frequencies and calculating the response. Abaqus/Standard supports three types of steady-state dynamic analysis:

Calculation of the frequency response for large finite element systems can be time consuming. Parallel execution is a practical option for steady-state dynamic analyses with millions of degrees of freedom, thousands of frequencies, and hundreds of load cases.

In mode-based steady-state dynamic analysis and subspace-based steady-state dynamic analysis, the response calculation is based on solution of the steady-state dynamic equations projected onto a subspace of modes. The modes of the undamped system must first be extracted using the eigenfrequency extraction procedure. The number of modes extracted must be sufficient to model the dynamic response of the system adequately, which is a matter of judgment on your part. In the general case of a finite element model with damping, the projected dynamic system of equations has number-of-modes equations and a dense complex matrix.

Parallel Execution of Direct-Solution Steady-State Dynamic Analyses

Direct-solution steady-state dynamic analysis is the most accurate and the most computationally intensive of the steady-state dynamic analysis types. In a direct-solution steady-state analysis, the frequency response is calculated directly in terms of the physical degrees of freedom of the model using the direct sparse solver.

For large-scale direct-solution steady-state dynamic analyses, we recommend execution on computer clusters using hybrid MPI-based and thread-based parallelization and GPGPU acceleration, if the graphic processing units are available. It is recommended to use 1 GPGPU per MPI domain; specifying more GPGPUs per MPI domain may not scale well.

Input File Usage

Enter the following input on the command line to specify the total number of physical cores for all MPI domains n, the number of threads per MPI domain, and the number of GPGPU devices k per MPI domain (if they are available):

abaqus job=job-name cpus=n threads_per_mpi_process=m gpus=k

Example: Parallel Execution of Direct-Solution Steady-State Dynamic Analyses

In this example, you execute a direct-solution steady-state dynamic analysis with 100 frequencies on a cluster of machine hosts with $2 \times 20 = 40$ physical cores. You split the frequencies into three partitions: 33-33-34. You execute the analysis for each partition on two cluster nodes (with a total of $40 \times 2 = 80$ physical cores) using 8 MPI domains (10 physical cores per MPI domain). The combined output results are obtained in job1.odb.

Three steady-state dynamic analyses:

abaqus job=job1 cpus=80 threads_per_mpi_process=10 ssd_split=3 ssd_partition=1 
abaqus job=job2 cpus=80 threads_per_mpi_process=10 ssd_split=3 ssd_partition=2
abaqus job=job3 cpus=80 threads_per_mpi_process=10 ssd_split=3 ssd_partition=3

Combining results:

abaqus odbcombine frequencyrange job1.odb job2.odb job3.odb

Parallel Execution of Mode-Based Steady-State Dynamic Analyses

In a mode-based steady-state dynamic analysis, the steady-state dynamic modal operator is combined from the modal stiffness, mass, and damping operators precalculated in the eigenfrequency extraction procedure. Solving the system of equations with respect to the generalized displacements can be costly for mode-based steady-state dynamic analysis with many thousands of modes, thousands of frequencies, and hundreds of load cases.

Abaqus offers a scalable high-performance hybrid CPU-GPU modal steady-state dynamics solver. Mode-based steady-state dynamic analysis cannot be executed in MPI mode. It can be executed in thread mode only within one node of a compute cluster, and it can take advantage of GPGPU acceleration, if the graphic processing units are available. To achieve maximal computational efficiency of a mode-based steady-state dynamic analysis, run it on a dedicated computer with sufficient memory and use all available physical cores and GPGPU devices.

Input File Usage

Enter the following input on the command line to specify the total number of physical cores n, and the number of GPGPU devices k (if they are available):

abaqus job=job-name cpus=n gpus=k

Example: Parallel Execution of Mode-Based Steady-State Dynamic Analyses

In this example you execute a mode-based steady-state dynamic analysis with 1000 frequencies on a cluster of machine hosts with $2 \times 20 = 40$ physical cores and 2 GPGPU devices. You split the frequencies into 10 partitions of 100 frequencies each. The combined output results are obtained in job1.odb.

You run multiple steady-state dynamic analyses with a single step restart from the same eigenfrequency extraction analysis job0 that is executed at a node of the same cluster. The combined output results are obtained in job1.odb.

Eigenfrequency extraction analysis:

abaqus job=job0

Ten steady-state dynamic analyses:

abaqus oldjob=job0 job=job1 cpus=40 gpus=2 ssd_split=10 ssd_partition=1
abaqus oldjob=job0 job=job2 cpus=40 gpus=2 ssd_split=10 ssd_partition=2
...
abaqus oldjob=job0 job=job10 cpus=40 gpus=2 ssd_split=10 ssd_partition=10

Combining results:

abaqus odbcombine frequencyrange job1.odb job2.odb ... job10.odb

Parallel Execution of Subspace-Based Steady-State Dynamic Analyses

A subspace-based steady-state dynamic analysis provides an approximate way to include frequency-dependent effects (such as frequency-dependent damping and viscoelastic effects) or nonsymmetric stiffness in the model. This solution technique is less accurate than direct-solution steady-state analysis, in particular if significant material damping or viscoelasticity with a high loss modulus is present, but it is computationally cheaper than direct-solution steady-state dynamics. The projection of the dynamic equilibrium equations onto a subspace of selected modes is performed at every frequency, which makes subspace-based steady-state dynamic analysis more expensive than mode-based steady-state dynamics. Only thread-based parallelization is available for subspace-based steady-state dynamic analysis including the operator projection and modal solution phases.

Input File Usage

Enter the following input on the command line to specify the total number of physical cores n:

abaqus job=job-name cpus=n

Splitting Frequency Points Technique for Running Steady-State Dynamic Analyses

When sufficient computer resources are available, you can significantly reduce the time required to run steady-state dynamic analyses with many frequencies by splitting the frequency points into several partitions and executing the analysis for each partition independently. No changes in the job input file are required.

To execute a steady-state dynamic analysis for a partition, you must specify the total number of partitions and the current partition number. If you execute analyses for multiple partitions simultaneously on a computer cluster, it can be significantly faster than executing a single analysis for all frequencies.

Input File Usage

Enter the following input on the command line to specify the total number frequency partitions s, and the current partition number p; specify a unique job-name for every partition:

abaqus job=job-name ssd_split=s ssd_partition=p

Splitting Frequency Points with Restart

You can use the frequency splitting technique only for a single-step analysis. If the analysis includes preloading steps or it includes an eigenfrequency extraction step calculating the modal subspace, you can use the restart capability to run a steady-state dynamic analysis with frequency splitting.

Input File Usage

Enter the following input on the command line to specify the total number frequency partitions s, and the current partition number p; specify a unique job-name for every partition but the same old-job-name for all partitions:

abaqus job=job-name oldjob=old-job-name ssd_split=s ssd_partition=p

Combining Results for Frequency Partitions

After the jobs for all partitions complete, you can combine the results into a single output database (.odb) file. You can combine only output results from analyses performed with the resultsformat=odb option.

Input File Usage

Enter the following input on the command line to combine results output for all frequencies. The file job-name1.odb will contain the combined result.

abaqus odbcombine frequencyrange job-name1.odb job-name2.odb job-name3.odb ...

Consistency of Results

Some physical systems (systems that, for example, undergo buckling, material failure, or delamination) can be highly sensitive to small perturbations. For example, it is well known that the experimentally measured buckling loads and final configurations of a set of seemingly identical cylindrical shells can show significant scatter because of small differences in features such as boundary conditions, loads, and initial geometries. When simulating such systems, the physical sensitivities seen in an experiment can be manifested as sensitivities to small numerical differences caused by finite precision effects. Finite precision effects can lead to small numerical differences when running jobs on different numbers of processors. Therefore, when simulating physically sensitive systems, you might see differences in the numerical results (reflecting the differences seen in experiments) between jobs run on different numbers of processors. To obtain consistent simulation results from run to run, the number of processors should be constant.