N body simulation in OpenMP

Hey there!

This is a continuation the open source project I worked on the previous blog.

Here I found few hotspots to optimize the OpenMP code in the same repository.

Code

Following is the part of the code which I improved.

 #pragma omp parallel for
 for( object_i = 0; object_i < OBJECT_COUNT; ++object_i ) {
     Vector3 total_force = { 0.0, 0.0, 0.0 };
     for( int object_j = 0; object_j < OBJECT_COUNT; ++object_j ) {
         if( object_i == object_j ) continue;
         Vector3 displacement = v3_subtract( current_dynamics[object_j].position, current_dynamics[object_i].position );
         double distance_squared = magnitude_squared( displacement );
         double distance = sqrt( distance_squared );
         double force_magnitude = ( G * object_array[object_i].mass * object_array[object_j].mass ) / distance_squared;
         Vector3 force = v3_multiply( (force_magnitude / distance ), displacement );
         total_force = v3_add( total_force, force );
     }
     Vector3 acceleration = v3_divide( total_force, object_array[object_i].mass);
     Vector3 delta_v = v3_multiply( TIME_STEP, acceleration );
     Vector3 delta_position = v3_multiply( TIME_STEP, current_dynamics[object_i].velocity );
     next_dynamics[object_i].velocity = v3_add( current_dynamics[object_i].velocity, delta_v );
     next_dynamics[object_i].position = v3_add( current_dynamics[object_i].position, delta_position );
 }

In the code snippet above, the force is calculated for every object (i) by appling the Newton’s gravitational equation with every other object (j).

Some information on the variables.

current_dynamics: Contains the current position (Vector3) of all objects.

next_dynamics: Where the updated position (Vector3) of the object is written to.

object_array: This contains the mass of all the objects.

Optimization

I applied Newton’s most famous law – Newton’s 3rd law. According to Newton’s 3rd law

To every action there is always opposed an equal reaction

The force on object i because of object j is same as the force on object j because of object i. So there is repeated force calculation going on here in the code. So I change the inner for loop to start from object_i + 1 instead of 0.  Here comes the catch, doing this will unbalance the distribution of the work across threads. The threads which gets lower index values will have to lot of work compared to threads getting higher index values. So I changed the scheduling of the parallel for loop to dynamic scheduling.

Also while calculating the force for object i due to object j I need to add the force to the object j . So I created an additional array to store it. Here I am storing the acceleration instead the force to avoid dividing it by mass again and again.

Also, I had to split the updating of the next_dynamics with the new position into an another for loop because I need to wait for all threads to complete its work of calculating the force.

Here is my code after the improvement.

 #pragma omp parallel for schedule(dynamic)
 for( object_i = 0; object_i < OBJECT_COUNT; ++object_i ) {
     Vector3 total_acc = { 0.0, 0.0, 0.0 };
     for( int object_j = object_i+1; object_j < OBJECT_COUNT; ++object_j ) {
         Vector3 displacement = v3_subtract( current_dynamics[object_j].position, current_dynamics[object_i].position );
         double distance_squared = magnitude_squared( displacement );
         double distance = sqrt( distance_squared );
         double acc_magnitude = ( G * object_array[object_j].mass ) / distance_squared;
         Vector3 accel = v3_multiply( (acc_magnitude / distance ), displacement );
         total_acc = v3_add( total_acc, accel );
         acc[object_j] = v3_add( acc[object_j], v3_multiply( object_array[object_i].mass / object_array[object_j].mass, accel ));
     }
     acc[object_i] = v3_add( acc[object_i], total_acc );
 }
 #pragma omp parallel for
 for( object_i = 0; object_i < OBJECT_COUNT; ++object_i ) {
     Vector3 acceleration   = acc[object_i];
     Vector3 delta_v        = v3_multiply( TIME_STEP, acceleration );
     Vector3 delta_position = v3_multiply( TIME_STEP, current_dynamics[object_i].velocity );
     next_dynamics[object_i].velocity = v3_add( current_dynamics[object_i].velocity, delta_v );
     next_dynamics[object_i].position = v3_add( current_dynamics[object_i].position, delta_position );
 }

Now time to run this code.

real 0m5.564s
user 0m21.012s
sys  0m0.008s

Initially, it was

real 0m7.667s
user 0m28.116s
sys  0m0.000s

So this optimization improved the performance of the code from almost 7.7 secs to 5.5 secs. This code simulates the motion over a given time. Here the time was 1/16 when compared the CUDA.

System information

Processor:  Intel® Core™ i5-4200U CPU @ 1.60GHz × 4
Memory:     3.6 GiB
Graphics:   GeForce GT 740M/PCIe/SSE2
OS Type:    64-bit
Advertisements

2 thoughts on “N body simulation in OpenMP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s