I found this code in GitHub which solves N-body problem using traditional Newtonian gravitational equations. The repository owner, pchapin, has already tried various parallelizing methods like – pthreads, OpenMP, MPI, and CUDA.
While going through the whole programs and running it for different inputs. I discovered that there few of spots for improvement for the CUDA code. So I compiled the CUDA code with nvcc and ran it on nvprof.