8. Introduction to genomics
8.6. Variations in the human genome
their parent genes for microRNA (miRNA) binding, thereby modulating the repression of the functional gene by its cognate miRNA. According to predictions, at least 9% of the pseudogenes present in the human genome are actively transcribed.
There are several web pages containing information about the genomes of human and other organisms (e.g.: http://genome.ucsc.edu/; http://www.ensembl.org/;
http://www.ncbi.nlm.nih.gov/). There is still an important topic, not detailed above, which is about the variations in the genome. We consider it, however, so important that there is a special subchapter for this topic (see below).
The mapping of the human genome has not been finished after the completion of the HGP. The Genome Reference Consortium has been founded, whose main task is to map the missing gaps. These are located in difficult-to-sequence regions, usually in repeat-rich regions. At the completion of the HGP about 350 gaps were in the genome.
These regions are not small; they represent about 5% of the genome. To fill these gaps are far from easy, which is shown by the fact that 6 years after the initiation of this project, in 2009, only 50 such gaps were completed.
106 Genetics and genomics
individuals, or there can be more copies of a gene in some people. Usually, it causes no large phenotypic differences, but there are several diseases, where CNVs can play a role, like Crohn’s disease, Alzheimer disease, autism, obesity, AIDS, etc.
Figure 8.3. Structural variants in the genome. Source accessed: 15/02/2013
CNVs can play a role in transplantation. If in the organ acceptor, owing to a CNV, a gene is missing, and the gene is present in the donor, a graft-versus-host disease can develop in spite of MHC identity, i.e. an immune response could develop against the gene product.
Structural differences were also observed in concordant twins. This observation questions the long standing notion that monozygotic twins are essentially genetically identical, and also shows that structural variation might also originate during somatic development.
The discovery of the abundance of the CNVs also changed our view of the genomic differences among individuals. In the first two papers about the sequence of the human genome it was stated that the genomic difference between two individuals is 0.1%, i.e.
any two persons are in 99.9% identical. At that time it caused a large media coverage.
The differences were attributed in these papers mainly to SNPs. Later the diploid sequences of both Craig Venter and James Watson have been published. Analysis of diploid sequences has shown that non-SNP variation, i.e. CNVs accounts for much more human genetic variation than single nucleotide diversity. It is estimated that approximately 0.4% of the genomes of unrelated people typically differ with respect to copy number. When copy number variation is included, human to human genetic variation is estimated to be at least 0.5% (99.5% similarity). In Table 8.2 the number of variations can be seen in some sequenced genomes. But, according to more recent studies, between individuals, separated historically long ago from each other, the difference can be as high as 2-3%. For this difference large genomic rearrangements can be responsible. Populations separated by distance tend to drift apart genetically over
time, and roughly 95% of variability between populations is a result of this random drift.
For some differences the natural selection is responsible (see Chapter 12).
Number of SNPs Genome of J. Craig Venter 3,213,401
Genome of James Watson 3,322,093
Asian genome 3,074,097
Yoruban (African) genome 4,139,196
Structural variations in Venter’s genome
n Long (bp)
CNV 62 8,855–1,925,949
Insertion/deletion 851,575 1–82,711
Block substitution 53,823 2–206
Inversion 90 7–670,345
Table 8.2. Number of variations in some sequenced genomes
There was a large change in our view regarding the development of modern human genome. In a Science paper published in May 2010, Scante Pääbo's international team found that a small amount—1% to 4%—of the nuclear DNA of Europeans and Asians, but not of Africans, can be traced to Neanderthals (http://www.sciencemag.org/content/328/5979/680.full). The most likely model to explain this was that early modern humans arose in Africa but interbred with Neanderthals in the Middle East or Arabia before spreading into Asia and Europe, about 50,000 to 80,000 years ago. Seven months later, on 23 December, the team published in Nature the complete nuclear genome of a girl's pinky finger from Denisova Cave in the Altai Mountains of southern Siberia. To their surprise, the genome was neither a Neanderthal’s nor a modern human's, yet the girl was alive at the same time, dating to at least 30,000 years ago and probably older than 50,000 years. Her DNA was most like a Neanderthal's, but her people were a distinct group that had long been separated from Neanderthals. By comparing parts of the Denisovan genome directly with the same segments of DNA in 53 populations of living people, the team found that the Denisovans shared 4% to 6% of their DNA with Melanesians from Papua New Guinea and the Bougainville Islands. Those segments were not found in Neanderthals or other living humans. The most likely scenario for how all this happened is that after Neanderthal and Denisovan populations split about 200,000 years ago, modern humans interbred with Neanderthals as they left Africa in the past 100,000 years. Thus Neanderthals left their mark in the genomes of living Asians and Europeans. Later, a subset of this group of moderns—who carried some Neanderthal DNA—headed east toward Melanesia and interbred with the Denisovans in Asia on the way. As a result, Melanesians inherited DNA from both Neanderthals and Denisovans, with as much as 8% of their DNA coming from archaic people (http://en.wikipedia.org/wiki/Denisova_hominin).
Later it was shown that archaic people contributed more than half of the alleles that code for proteins made by the human leukocyte antigen system (HLA), which helps the immune system to recognize pathogens. Pääbo's team published the complete genome of the Denisovan cave girl. She didn't carry B*73—and it hasn't been found in Siberia—
but she carried two other linked HLA-C variants, which occur on the same stretch of chromosome 6. If living people have any of these variants, they almost always carry at least two of the three variants—as did the cave girl. So even though she lacked B*73, the researchers propose that all three variants were inherited, often in pairs, from archaic
108 Genetics and genomics
humans in Asia. From immunology it is known that the more heterogeneous a population in certain HLA genes is, the more successful it is in defending against pathogen challenges. Thus, it seems that archaic genome contributed to modern human HLA variations and selection fitness through horizontal gene transfer.
After the HGP, several project contributed to the SNP databases. Such projects were the HapMap projects (http://hapmap.ncbi.nlm.nih.gov/;
http://en.wikipedia.org/wiki/International_HapMap_Project) and the 1000 genome project. E.g. in the pilot paper of the 1000 genome project 15 million SNPs, 1 million short structural variations (insertion, deletions) and 20 thousand large structural variations were published (http://www.sciencemag.org/content/330/6004/574). Most of them were new variations.
The MHC (HLA) region (6p21.3) is quite special, regarding the density of the variations. In this 7.6 Mb long region are located the MHC genes playing important roles in immune response and transplantations. Here, in the MHC class III region can be found the highest gene density (59 expressed genes), and the highest genetic diversity. In a study, in a 4 Mb region 37 thousand SNP and 7 thousand structural variations have been detected, which correspond to a genetic diversity one order of magnitude higher than in other parts of the genome.