CompE/Biocomputing Research Opportunity

Prof. Carl Ebeling is looking for a student to work on a research problem that involves using massive parallel hardware to work on a DNA sequencing problem this summer. His description is below. Please contact him at ebeling[at]cs if you're interested.

Imagine someone has sent you a book that you desperately want to read.

But just to make it interesting, they printed it on one long line like a tickertape, and then chopped it up randomly so that each piece is between 30 and 70 characters long. To help you out, they printed 20 copies of the book this way before chopping them up into pieces. The book happens to be a billion characters long so you have several 100 million pieces and you are trying to reassemble one copy of the book.

That, in essence, is the short read problem in biology where this book printed on one line is a genome DNA sequence that comprises just 4 different characters. To reassemble the book, you have to rely on the overlaps between pieces that you get from chopping up many different copies of the book. Biologists can now generate these 100's of millions of "short reads" quickly and cheaply, but reassembling them into the original sequence takes a huge amount of time on large computer clusters. Our project is to use massive parallelism in hardware (FPGAs) to get a faster, cheaper and lower power solution.

If you'd like to get paid to work on this problem over the summer, please talk to me. It helps if you like both algorithms *and* hardware.

Carl