Title: Effects of read start bias on discovering allele-specific expression in RNA-seq
Advisors: Larry Ruzzo and Sreeram Kannan (EE)
Abstract: Allele-specific expression (ASE) is a biological phenomenon in which one half of the genome at a particular location is expressed more than the other. Computational methods and data science enable discovery of ASE in RNA-seq data. If an algorithm can computationally identify regions where this is occurring differently in two organisms of the same species, ASE could be used as a marker for a cross between them. Technical artefacts from the sequencing process must also be corrected for computationally. Read start bias is one such artefact that has not been evaluated with respect to ASE. In this work, I identify regions exhibiting ASE and evaluate the effects of read start bias on the ability to discover them. This is achieved with the AlleleSeq pipeline (Rozowsky et al. Molecular systems biology, 2011) with bias correction via SeqBias (Jones et al. Bioinformatics, 2012), and applied to RNA-seq data from an interesting marine organism called Thalassiosira pseudonana. It is theorized that some strains of T. pseudonana have undergone a lapse in part of their reproductive cycle, and I evaluate the potential of using ASE as a marker to study this question.