Whole genome alignments using MPI-LAGAN
Advances in sequencing technologies have substantially increased the number of fully sequenced genomes. Alignment algorithms play a crucial rule in analyzing whole genomes, identifying similar and conserved regions between pairs of genomes, leading to annotation of genomes with site-specific properties and functions.
In this work we introduce a parallel algorithm for a widely used whole genome alignment method called LAGAN. We use the MPI-based protocol, to develop parallel solutions for two phases of the algorithm which take up a significant portion of the total runtime, and also have a high memory requirement. The serial LAGAN program uses CHAOS to quickly determine initial anchor or seeds, which are extended using a sparse dynamic programming based longest-increasing subsequence method. Our work involves parallelizing the CHAOS and LIS phases of the algorithm using a one-dimensional block cyclic partitioning of the computation. This leads to development of an efficient algorithm that utilizes the processors in a balanced way. We also ensure minimum time spent in communication or transfer of information across processors.
We also report experimental evaluation of our parallel implementation using pairs of human contigs of varying lengths. We discuss and illustrate the challenges faced in parallelizing a sparse dynamic programming formulation as in this work, and show equivalent to theoretical speedups for our parallelized phases of the LAGAN algorithm.