Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly

Guillaume Holley*, Doruk Beyter, Helga Ingimundardottir, Peter L. Møller, Snædis Kristmundsdottir, Hannes P. Eggertsson, Bjarni V. Halldorsson

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.

Original languageEnglish
Article number28
Pages (from-to)28
JournalGenome Biology
Volume22
Issue number1
DOIs
Publication statusPublished - 8 Jan 2021

Bibliographical note

Publisher Copyright:
© 2021, The Author(s).

Other keywords

  • Bioinformatics
  • Hybrid error correction
  • Human genetics
  • Molecular genetics
  • Genome
  • Genetic variation
  • Sequence analysis
  • Quality control
  • Data processing

Fingerprint

Dive into the research topics of 'Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly'. Together they form a unique fingerprint.

Cite this