Math Bio Seminar: Identifying Peaks and Estimating Parameters for CUT&RUN Sequencing Using Branching Process Model
Author: Lona
Author: Lona
Speaker: Debosmita Kundu, Iowa State University (Statistics)
Abstract: CUT&RUN is a new method for detecting protein interactions with DNA that is easier to implement than ChIP-seq and can work with less starting DNA. This method combines antibody-targeted controlled cleavage by micrococcal nuclease with massively parallel DNA sequencing to identify DNA fragments bound to a target protein. Almost all sequencing-based quantification methods, including CUT&RUN, involve DNA amplification by Polymerase Chain Reaction (PCR). For many methods, the amplification is virtually invisible because the complexity of the initial sample and the low rate of final sampling ensures that the vast majority of sequenced fragments are not PCR duplicates. However, amplification is a major factor in CUT&RUN with low amounts of starting DNA. Many sampled fragments are identical, and it is impossible to know if they are PCR duplicates or repeatedly sampled identical molecules. There are confounding recommendations in the CUT&RUN data analysis literature, varying from complete removal to complete retention of all duplicate DNA fragments. We propose a branching process model for PCR amplification and sampling. I will present our progress in developing a statistical estimation procedure for this model to distinguish the types of duplication and more accurately detect the regions of enriched DNA fragments or peaks indicating protein binding events in genomes treated by CUT&RUN.