Diversity in non-repetitive human sequences not found in the reference genome

Birte Kehr, Anna Helgadottir, Pall Melsted, Hakon Jonsson, Hannes Helgason, Adalbjörg Jonasdottir, Aslaug Jonasdottir, Asgeir Sigurdsson, Arnaldur Gylfason, Gisli H. Halldorsson, Snaedis Kristmundsdottir, Gudmundur Thorgeirsson, Isleifur Olafsson, Hilma Holm, Unnur Thorsteinsdottir, Patrick Sulem, Agnar Helgason, Daniel F. Gudbjartsson, Bjarni V. Halldorsson*, Kari Stefansson

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

33 Citations (Scopus)

Abstract

Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r 2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.

Original languageEnglish
Pages (from-to)588-593
Number of pages6
JournalNature Genetics
Volume49
Issue number4
DOIs
Publication statusPublished - 30 Mar 2017

Fingerprint

Dive into the research topics of 'Diversity in non-repetitive human sequences not found in the reference genome'. Together they form a unique fingerprint.

Cite this