TY - JOUR
T1 - Diversity in non-repetitive human sequences not found in the reference genome
AU - Kehr, Birte
AU - Helgadottir, Anna
AU - Melsted, Pall
AU - Jonsson, Hakon
AU - Helgason, Hannes
AU - Jonasdottir, Adalbjörg
AU - Jonasdottir, Aslaug
AU - Sigurdsson, Asgeir
AU - Gylfason, Arnaldur
AU - Halldorsson, Gisli H.
AU - Kristmundsdottir, Snaedis
AU - Thorgeirsson, Gudmundur
AU - Olafsson, Isleifur
AU - Holm, Hilma
AU - Thorsteinsdottir, Unnur
AU - Sulem, Patrick
AU - Helgason, Agnar
AU - Gudbjartsson, Daniel F.
AU - Halldorsson, Bjarni V.
AU - Stefansson, Kari
PY - 2017/3/30
Y1 - 2017/3/30
N2 - Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r 2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.
AB - Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r 2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.
UR - http://www.scopus.com/inward/record.url?scp=85014090196&partnerID=8YFLogxK
U2 - 10.1038/ng.3801
DO - 10.1038/ng.3801
M3 - Article
C2 - 28250455
AN - SCOPUS:85014090196
SN - 1061-4036
VL - 49
SP - 588
EP - 593
JO - Nature Genetics
JF - Nature Genetics
IS - 4
ER -