Introduction to GPU Computing


In this assignment you will implement a parallel bitonic sorting algorithm on the Nvidia "GeForce 9800 GX2" Graphics Card and measure the speed up that it offers in comparison to a single CPU implementation.

You can look up the chapter on Comparison Networks in Cormen for a detailed anlaysis of bitonic sorting or you may find these links to be helpful as well 1 , 2

The NVIDIA Programming Guide is also a useful resource for learning more about the GPU programming enviornment

Getting Started

1:SSH to the Cuda machine in RISE lab. The IP address for the machine is you dont have a RISE lab login please contact Venkat.You can log into this machine for running and compiling your code.You will have to share an account with all other users so make sure you dont delete any of their files. The username is cuda and the password is cuda123.

A cuda code is comprised of two components a device component and a GPU component. The GPU component is also known as the kernel code. Cuda is exactly like C just that it has a few keywords which help you interact with the GPU.

The code for this assignment is available here.

You can compile this code using the nvcc compiler which can we called by invoking the command nv ( nv is an alias so please dont delete your .bashrc files )

nv filename

This creates an executable which shall run both on the CPU as well as the GPU.

To run the code pass it the number of numbers to be sorted as the argument. The above code can handle numbers only in powers of 2 and has a lower limit of 512 numbers and an upper limit of 2^24 .

Try running and compiling it and estimatate the speedup that they offer as compared to quick sort or any other sorting alorithm you wish to use running on the CPU.


Implement your own version of the bitonic sorting algorithm and extend it to efficiently handle numbers which are not powers of 2.


Incase of any doubts or errors feel free to mail me at