Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Original language | English |
---|---|
Article number | 28 |
Pages (from-to) | 28 |
Journal | Genome Biology |
Volume | 22 |
Issue number | 1 |
DOIs | |
Publication status | Published - 8 Jan 2021 |
Bibliographical note
Publisher Copyright:© 2021, The Author(s).
Other keywords
- Bioinformatics
- Hybrid error correction
- Human genetics
- Molecular genetics
- Genome
- Genetic variation
- Sequence analysis
- Quality control
- Data processing