TY - JOUR
T1 - Whole genome characterization of sequence diversity of 15,220 Icelanders
AU - Jónsson, Hákon
AU - sulem, patrick
AU - Kehr, Birte
AU - Kristmundsdóttir, Snædís
AU - Zink, Florian
AU - Hjartarson, Eiríkur
AU - Hardarson, Marteinn
AU - Hjorleifsson, Kristjan
AU - Eggertsson, Hannes
AU - Guðjónsson, Sigurjón Axel
AU - Ward, Lucas D.
AU - Arnadottir, Gudny
AU - Helgason, Einar A.
AU - Helgason, Hannes
AU - Gylfason, Arnaldur
AU - Jónasdóttir, Aðalbjörg
AU - Jónasdóttir, Áslaug
AU - Rafnar, Thorunn
AU - Besenbacher, Soren
AU - Frigge, Michael L.
AU - Stacey, Simon N.
AU - Magnússon, Ólafur T.
AU - Þorsteinsdóttir, Unnur
AU - Másson, Gísli
AU - Kong, Augustine
AU - Halldórsson, Bjarni
AU - Helgason, Agnar
AU - Gudbjartsson, Daniel
AU - Stefansson, Kari
PY - 2017/9/21
Y1 - 2017/9/21
N2 - Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.
AB - Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.
KW - DNA sequencing
KW - Genetic variation
KW - Haplotypes
KW - Rare variants
KW - DNA-rannsóknir
KW - Erfðabreytileiki
KW - Erfðafræði
KW - DNA sequencing
KW - Genetic variation
KW - Haplotypes
KW - Rare variants
KW - DNA-rannsóknir
KW - Erfðabreytileiki
KW - Erfðafræði
U2 - 10.1038/sdata.2017.115
DO - 10.1038/sdata.2017.115
M3 - Article
SN - 2052-4463
VL - 4
JO - Scientific data
JF - Scientific data
ER -