De Bruijn graph based genome assembly has gained popularity as short read sequencers become ubiquitous. A core assembly operation is the generation of unitigs, which are sequences corresponding to chains in the graph. Unitigs are used as building blocks for generating longer sequences in many assemblers, and can facilitate graph compression. Chain compaction, by which unitigs are generated, remains a critical computational task. In this paper, we present a distributed memory parallel algorithm for simultaneous compaction of all chains in bi-directed de Bruijn graphs. The key advantages of our algorithm include bounding the chain compaction run-time to logarithmic number of iterations in the length of the longest chain, and ability to differentiate cycles from chains within logarithmic number of iterations in the length of the longest cycle. Our algorithm scales to thousands of computational cores, and can compact a whole genome de Bruijn graph from a human sequence read set in 7.3 seconds using 7680 distributed memory cores, and in 12.9 minutes using 64 shared memory cores. It is 3.7× and 2.0× faster than equivalent steps in the state-of-the-art tools for distributed and shared memory environments, respectively. An implementation of the algorithm is available at https://github.com/ParBLiSS/bruno.
To View the Base Paper Abstract Contents
Now it is Your Time to Shine.
Great careers Start Here.
We Guide you to Every Step
Success! You're Awesome
Thank you for filling out your information!
We’ve sent you an email with your Final Year Project PPT file download link at the email address you provided. Please enjoy, and let us know if there’s anything else we can help you with.
To know more details Call 900 31 31 555
The WISEN Team