Practical probabilistic and graphical formulations of long-read polyploid haplotype phasing

Published: Nov. 8, 2020, 4:04 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.06.371799v1?rss=1 Authors: Shaw, J., Yu, Y. W. Abstract: Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition (MSMTP) problem, which is a more flexible graphical metric compared to the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization (UPEM) model, which is a probabilistic generalization of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called flopp. We show that flopp compares favorably to state-of-the-art algorithms -- up to 30 times faster with 2 times fewer switch errors on 6x ploidy simulated data. Copy rights belong to original authors. Visit the link for more info