Greek autosomal DNA

Greek autosomal DNA
by Dienekes Pontikos
Last Update: 2 May, 2009

A striking demonstration of the persistence of the Greek genetic signature through time can be found in [1]. The figure on the right is the 4th principal component of variation in Europe and shows a strong cline centered in Greece. Not only is the Greek genetic legacy clearly detectible today, but it is detectible among not only the Greeks, but all their neighboring populations of partial Greek ancestry:

Figure 2. Hidden patterns in the geography of Europe shown by the first five principal components, explaining respectively 28%, 22%, 11%, 7%, and 5% of the total genetic variation for 95 classical polymorphisms (1, 13, 14). The first component is almost superimposable to the archaeological dates of the spread of farming from the Middle East between 10,000 and 6,000 years ago. The second principal component parallels a probable spread of Uralic people and/or languages to the northeast of Europe. The third is very similar to the spread of pastoral nomads (and their successors) who domesticated the horse in the steppe towards the end of the farming expansion, and are believed by some archaeologists and linguists to have spread most Indo-European languages to Europe. The fourth is strongly reminiscent of Greek colonization in the first millennium B.C. The fifth corresponds to the progressive retreat of the boundary of the Basque language. Basques have retained, in addition to their language, believed to be descended from an original language spoken in Europe, some of their original genetic characteristics. (From ref. 1, with permission of Princeton University Press, modified.)

The genetic affinities of human populations can be determined by examining large numbers of polymorphisms. For example, Ayub et al. [2] used 182 tri- and tetra-autosomal microsatellites, which allowed them to create the following tree based on DAS genetic distance between the sampled populations. It is clear that Greeks belong in the Caucasoid cluster of populations (encompassing groups from “North European” to “Burusho” in the figure), and are clearly distinguished from the Asian/Oceanian/American cluster (“Cambodian” to “Mayan Indian”), and even more from the African groups (“San” to “Zaire Pygmy').

Modern studies of autosomal DNA rely on the study of large numbers of single-nucleotide polymorphisms (SNPs), i.e., of changes in a single letter of the genetic code. A recent study [3] used 10,000 such polymorphisms to investigate the genetic structure of European populations, including a sample of Greeks. Two different techniques were used: principal components analysis (PCA) , which find the most important dimensions summarizing the variability of the genetic data, and STRUCTURE a widely-used model-based clustering program, which assigns individuals to a number K of different clusters.

The results of the STRUCTURE runs are pictured below.


For each number of clusters (K), each cluster is assigned a color. Each individual from the studied populations corresponds to a vertical line, and consists in various proportions of the different clusters. We observe that the Greek individuals belong to the main European-West Asian-North African (Cluster) cluster for K up to 5. At K=6 a "Mediterranean" small cluster (green) emerges which encompasses particularly populations bordering the Mediterranean as well as Armenians. In particular, we observe that there is no visible contribution of the East Eurasian (Mongoloid) pink cluster or of Sub-Saharan African (Negroid) red cluster.

The results of the PCA for the first two principal components are shown below.


Each bar corresponds to a population, and its width covers the variability of the different sampled individuals within each population. The first principal component (PC1) separates Sub-Saharan Africans (Mende and Burunge) from Eurasians. The second principal component (PC2) separates Mongoloids and East Indians (Altai, Brahmin, and Mala) from other populations. In both, it is evident that the Greek individuals exhibit a typically West Eurasian (Caucasoid) genomic profile.

While the above studies have examined global population structure, more recent studies have focused on uncovering finer structure within populations of European ancestry themselves. For example [4] studied the ancestry of European Americans using 583 SNP markers. The authors determined that the major feature of European American variation is clinal along a Southeast-Northwest axis, a finding which confirms the above-mentioned work of Cavalli-Sforza [1] based on classical markers. The second most noteworthy feature separates Southeast Europeans from Ashkenazi Jews. The Greek individuals of this study, like their Italian counterparts had typical southeastern characteristics, and were clearly separated from the Ashkenazi Jews.


Another study, [5] considered a larger number of SNPs, with similar results. Once again, the major feature of the variation separated populations from northern Europe and those from southern Europe, while the second principal distinguished between southern Europeans and Ashkenazi Jews. Greek individuals were closest to Italian ones.



Another study [6] studied more than 2,500 Europeans using a 500,000-marker Affymetrix chip; this is the most extensive and detailed sampling of European autosomal variation yet. The authors conclude that the levels of heterozygosity and linkage disequilibrium observed in southern Europe are consistent with a settlement of the continent proceeding from the south to the north. Europeans form, with the exception of the Finns, a genetic continuum. Members of each ethnic group cluster together, and overlap partially with neighboring groups, but can be fully distinguished genetically from more distant ones.These results indicate both the relative homogeneity of the European gene pool, but also the fact that they can be distinguished strongly genetically along geographically and even ethnic lines.



The study included a sample of 51 northern Greeks. It is evident that these Greeks (marked by EL), form a homogeneous cluster, none of them falling in the middle of clusters formed by other ethnic groups. Some of the former Yugoslavs (marked by YU) do fall in the middle of the Greek cluster, however. These former Yugoslavs, as well as the two Italian groups (IT1 and IT2) form the Greeks' closest genetic neighbors. The Yugoslavs are between Greeks and Czechs and Poles, consistent with their having both indigenous Balkan and non-Balkan Slavic origins; the Italians are between Greeks and Spaniards, consistent with their having an Eastern Mediterranean contribution, due perhaps to Neolithic farmers, or ancient (e.g. Greek or Etruscan) colonists.

Shortly after the previous study appeared, another article [7] used the same 500K Affymetrix chip over a sample of 3,192 individuals, including 8 Greeks. While many of the sampled populations are represented by a small number of individuals, thus making generalization more difficult, it is evident that the first two principal components bear an even stronger relationship to the geographical map of Europe. This was probably made possible by the inclusion of a wider range of populations, including many from eastern Europe.



With the caveat of the small population sample numbers, these results are fairly consistent with those of the previous study. Greeks (GR) are once again between their northern neighbors (especially Albanians (AL), Slavomacedonians (MK), Bulgarians (BG), Romanians (RO), and Kosovars (KS)) and Italians (IT). Greek Cypriots (CY) and Turks (TR) also frame the Greek sample on a more southern and eastern direction respectively. The Greeks' closest neighbors appear to be their immediate northern neighbors, as well as some of the Italians who otherwise appear to be quite variable, some of them being more similar to their Central European neighbors; Northern Balkan Slavic populations (Slovenians (SI), Croats (HR), Bosnians (BA) appear more distant in the direction of Central and Eastern European Slavs.

Studies such as the above [4-7] have shown that in the first two principal components individuals from different European groups tend to cluster with each other. However, these components capture only part of the overall genetic variation: the most salient part that is associated with geography and ethnicity. A new study [8] investigated the overall genetic similarity of individual Europeans, using the dataset also used by [6]. For each individual, a "best overall match" (BOM), i.e., the individual most similar to him was calculated over all the markers. The results are shown in the table below:

Each row in this table shows the origin of these BOMs. As the authors note "in a considerable proportion of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself". For example, the Finnish (FI) sample consists of 47 individuals: 39 of them have a BOM that is also a Finn, while 1, 4, and 3 have a Norwegian (NO), German (DE1), and Polish (PO) best match. It is important to note how sample sizes affect these numbers: there are 47 out of 2,457 Finns in the total sample (1.9%). Therefore, if Finns were indistinguishable from other Europeans, then it would be expected that only about 0.9 of them (1.9% of 47) would have a Finnish BOM. Thus, the fact that 39 of them do is highly significant (43 times higher than chance). But, the observation remains valid that a member of a particular group may have a "genetic look-alike" from a different group.

Turning to Greeks (EL, recruited in northern Greece), we see that they have BOMs from Norway, Sweden, the UK, Denmark, the Netherlands, Germany, Austria, Switzerland, Italy, and Greece. Conversely, the BOMs of some Dutch, Spanish, Italian, and Greek individuals is a Greek. Overall, the Greek sample consists of 51 individuals, and hence one expects (by chance) that only 1.1 of them would have a Greek BOM. Thus, Greeks have a 7-fold higher than random chance of having a fellow Greek as their BOM. Different European groups vary substantially in this: the aforementioned Finns seem to be most distinct, with most of them being more similar to a co-ethnic than to any other Europeans. Other groups seem to be less so; for example no Austrians (AT) have a fellow Austrian BOM.

The overall BOMs of the Greek individuals is also noteworthy because no matches are observed between Greeks and Eastern Europeans or vice versa. This probably indicates the absence among Greeks of many substantially "Slav-like" individuals; individual Greeks may have "genetic look-alikes" in distant Britain or Scandinavia, but none at all in Eastern Europe. Indeed, they have a greater-than-random number of matches only with the large German sample (DE1) from Kiel, which probably indicates the substantial heterogeneity of this sample, whose members serve as close matches to many European ethnic groups. The study also includes in its supplementary material, a table of the mock false positive rate among different population pairs; this is a measure of genetic distance between them:

For the Greek sample, the closest populations are Yugoslavs (YU, 0.047), Italians (IT2, 0.0049; IT1, 0.053), and Austrians (AT, 0.054). Most distant ones are Finns (FI, 0.142), Germans (DE1, 0.117), Dutch (NL, 0.112), UK (UK, 0.106), and Norwegians (NO, 0.103). This parallels the observation in [6] that in the first two principal components, Greeks are closest to Yugoslavs and Italians among the studied groups.

Auton et al. [9] studied a sample of Greeks from Greece and Cyprus in a global context of 3,845 individuals based on about 450K SNPs. The results of the STRUCTURE analysis are shown below, with increasing number of clusters starting from K=2 (top row). The studied individuals from Greece (#15) and Cyprus (#9) appear unremarkable in this analysis. It is evident that, in comparison to worldwide populations, the studied Europeans are fairly homogeneous, composed primarily of the "red" component, with no apparent significant contributions from ancestral elements typical of other continental groups.

  1. L. Luca Cavalli-Sforza, “Genes, peoples, and languages,” Proc. Natl. Acad. Sci. USA, Vol. 94, pp. 7719-7724, July 1997.
  2. Qasim Ayub et al., “Reconstruction of Human Evolutionary Tree Using Polymorphic Autosomal Microsatellites,” American Journal of Physical Anthropology, 122:259–268 (2003)
  3. Marc Bauchet et al., Measuring European Population Stratification using Microarray Genotype Data, American Journal of Human Genetics (in press), (2007)
  4. Price AL, Butler J, Patterson N, Capelli C, Pascali VL, et al. (2008) Discerning the Ancestry of European Americans in Genetic Association Studies. PLoS Genet 4(1): e236. doi:10.1371/journal.pgen.0030236
  5. Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, et al. (2008) Analysis and Application of European Genetic Substructure Using 300 K SNP Information. PLoS Genet 4(1): e4. doi:10.1371/journal.pgen.0040004
  6. Lao O. et al. (2008) Correlation between Genetic and Geographic Structure in Europe, Current Biology doi:10.1016/j.cub.2008.07.049
  7. Novembre J. et al. (2008) Genes mirror geography within Europe, Nature doi:10.1038/nature07331
  8. Tehva Lu T. et al. (2009) An evaluation of the genetic-matched pair study design using genome-wide SNP data from the European population, Eur J Hum Genet doi:10.1038/ejhg.2008.266
  9. Auton A. et al. (2009) Global distribution of genomic diversity underscores rich complex history of continental human populations, Genome Research, doi:10.1101/gr.088898.108