Is race a biological reality? Or is it a social construction? It is a debate that shows no sign of being resolved. The more that we know of the genetics of human differences, ironically, the more fractious the debate seems to get, and the more entrenched the various positions seem to be.

The latest issue of the magazine American Scientist contains a review by the biologist Jan Sapp of two books that insist that race has no biological validity. Sapp agrees. ‘The consensus among Western researchers today’, he suggests, ‘is that human races are sociocultural constructs’. Nevertheless ‘the concept of human race as an objective biological reality persists in science and in society. It is high time that policy makers, educators and those in the medical-industrial complex rid themselves of the misconception of race as type or as genetic population.’

The distinguished evolutionary biologist Jerry Coyne, who possesses impeccable liberal and anti-racist credentials, took umbrage at the review. ‘If that’s the consensus’, he snorted, then ‘I am an outlier’. Coyne insists that ‘human races exist in the sense that biologists apply the term to animals’. The equally distinguished biological anthropologist Jonathan Marks responded with what he himself described as a ‘rant’ against Coyne. ‘I have really had it with anti-intellectualism masquerading as biological science’, Marks fumed, claiming that Coyne  ‘isn’t interested’ in what anthropologists have learnt about human population differences and comparing Coyne’s view on race with that of Creationists on evolution.

Why are we still having these kinds of debates? Why has a deepening understanding of genetics, and of the human genome, not helped to answer the questions, even among those who insist that their views derive solely from the facts? Last year I gave a lecture entitled ‘Why both sides are wrong in the race debate’ to a conference at the Norwegian Academy of Science & Letters on the relationship between biological and social research, a lecture that addressed some of these questions. So here is a shortened transcript of the talk. For an extended discussion of these issues, see my book Strange Fruit.

In February 2001, the journal Nature published the first draft of the human genome. The cover of this special issue comprised a DNA double helix formed by a mosaic of photographs of human faces.Francis Collins, director of America’s National Human Genome Research Institute, the largest of the public bodies funding the Human Genome Project, explained that the image ‘was chosen… specifically to convey a message’:

They are people from every ethnicity, and culture, and form of dress and age, and gender that you can think of, and that was really what we wanted to say. This is the people’s genome.

Or as Craig Venter, founder of Celera Genomics, the private corporation that revolutionised the process of mapping the genome, put it, ‘The Human Genome Project shows there is no such thing as race’.

The message was clear: Biology permits no racial divisions. The Human Genome Project has settled once and for all the age-old question of whether races really exist. Except that it hasn’t. Neil Risch, as distinguished a geneticist as Collins and Venter, argues to the contrary that ‘A decade or more of population genetics research have documented biological differences between the races.’ He is, he says, a ‘race realist’.

That race has once more become important as a scientific category seems incontestable. The last few years have seen, for instance, the development of a number of so-called race specific drugs, medicines targeted at particular race. The first to be licensed by the US government was BiDiL, a heart drug to be used only on African Americans. In the HapMap project, the biggest and most important international follow-up to the Human Genome Project, geneticists have been ‘analyzing DNA from populations with African, Asian, and European ancestry’ to help provide data for treating diseases. HapMap is simply the very big tip of a very big iceberg. Virtually every issue of every genetics journal contains studies of disease or disorder in which sample populations are defined as racial categories. Anthropologists have created a number of software programmes to determine an individual’s race from the shape of his skull. These programmes are now routinely used both by police forces and by international NGOs working in places like Bosnia and Iraq to identify bodies, say in mass graves.

All this has led to a fierce debate about the scientific meaning of race. The Human Genome Project – one of the great scientific missions of the late twentieth century – demonstrates that races do not exist. And it demonstrates that they do. Ironically, it seems that the more we find out about human biology, the more uncertain scientists appear to be about the meaning of race. Why? That is the question what I want to explore. And why it is that both sides in this debate are both right – and wrong. Let me start by looking at the two sets of arguments – that race is a social construction and that it is a biological reality – and at why neither stand up to scrutiny.

Is race a social construction?

The scientific consensus for the past half century has been that race is meaningless as a biological category. Three main arguments have been used to justify this belief. First, 85% of human genetic variation exists within populations; less than ten per cent distinguishes what are commonly called races. Second, there is no such thing as a ‘pure race’; all humans are mongrels. And third, all human populations merge into each other, meaning that distinctions between races are arbitrary.

1. Eighty-five per cent of variation exists within populations

Imagine that some nuclear nightmare wiped out the entire human race apart from one small population – say, the Masai tribe in East Africa. Almost all the genetic variation that exists in the world today would still be present in that one small group. That is a dramatic way of expressing the results of a landmark analysis conducted by the geneticist Richard Lewontin in 1972. Lewontin showed that virtually the entire range of human differences – 85 per cent – occurred between individuals within single populations. A further 7 per cent differentiated between populations within a race. Only 8 per cent of total variation distinguished what we call the major races.

Since 1972 other researchers have confirmed that 85 per cent of variation exists within a population. The results of a 2002 study by Noah Rosenberg and his colleagues were even more striking. They showed that differences among individuals account for a staggering 93-95 percent of all genetic variation. About 2 per cent is taken up by differences between populations within a race. And race accounts for just 3-5 per cent of all human difference. The study also found that almost half of the alleles appeared in every major population in the world. Only 7.4 per cent of alleles were exclusive to one region, and such alleles tended to be very rare. Rosenberg’s study is the largest of its kind and hence the figures are likely to be the most accurate.

It seems undeniable, then, that race has no biological meaning. Virtually all genetic variation is within a population, not between ‘races’ and very few genes are exclusive to one part of the world. Yet, race realists argue, it would be wrong to conclude from this that races are necessarily biological nonsense. From a genetic point of view poodles and greyhounds are almost identical, as are dachshunds and St Bernards. Tiny genetic differences can lead to major bodily and behavioural changes. Humans share about 99.4 per cent of our functional genes with chimpanzees. Yet we are clearly very different species. Just 50 out of the 20,000 or so genes that humans and chimps are thought to possess may account for all of the cognitive differences between humans and apes. The fact that race accounts for only 4 per cent of genetic variation among humans does not, race realists suggest, necessarily mean that race has no biological validity.

2. There is no such thing as a pure race

The conventional view of a race is as a discrete group that is distinguished by certain features – skin colour or body shape, say, or even musicality or intelligence – that are unique to that group. But we now know that there are no features that are the possession of one group to the exclusion of another. There exists, as the late Stephen J Gould put it, ‘no “race” gene present in all members of one group and none of another.’ Similarly Luca Luigi Cavalli Sforza, the doyen of contemporary population geneticists, suggests that ‘The classification into races has proved to be a futile exercise’. Why? Because ‘All populations or population clusters overlap when single genes are considered, and in almost all populations, all alleles are present but in different frequencies. No single gene is therefore sufficient for classifying human populations into systematic categories.’

These days, though, few race realists believe that races are exclusive groups distinguished by a unique set of genes or physical markers. Rather, they argue, although all races possess all alleles, they do so in different proportions. Almost half a century ago the great biologist Ernst Mayr distinguished between what he called ‘typological’ and ‘population’ views of race. The typological idea of race, Mayr wrote, ‘asserts that every representative of a race conforms to the type and is separated from the representatives of any other race by a distinct gap.’ This was how nineteenth century biologists had conceived of race. The ‘populationist’, on the other hand, does not suggest that all members of a race possess a unique common feature or ‘essence’, nor that members of one race are entirely distinct from those of another. They suggest, rather, that races differ from each other statistically, not absolutely.

There are no genes that are specifically ‘black’ or specifically ‘white’, but some genes will be more prevalent within black populations and some more prevalent within white ones. And that is sufficient, race realists suggest, to divide the world into races. We know, for instance, that Europeans are more likely to have blue eyes than Africans, and more likely to be taller than East Asians. There are also less obvious differences between the races. If you are African, for instance, you are twice as likely to have a twin brother or sister than if you are European. East Asians are half as likely again as Europeans to be born as one of a twin. The fact that there are no pure races does not necessarily mean that there are no races.

3. Distinctions between races are arbitrary

OK, say the critics, we can all agree that the people of Mongolia, say, look different to those of Egypt. But there is no point between Cairo and Ulan Batur at which the race to which Egyptians belong ends and those to which Mongolians belong begins. Since there is ‘continuous gradation’ in gene frequencies, and every population shades imperceptibly into another so the concept of race is meaningless.

Even race realists acknowledge the difficulty of defining races. Jon Entine is an American journalist and a leading advocate of the race concept. ‘The precise number and grouping of races will always be somewhat arbitrary’, he has written. Dividing humans into races ‘is akin to wrestling an octopus into a shoe box: no matter how hard you fight with it, you still have something dangling out somewhere. Modern typologists cannot even agree whether it is more meaningful to lump races into large fuzzy groups to split them into smaller units of dozens or even hundreds of populations.’

When even a strong proponent of the race concept admits that it is next to impossible to divide humans into a clean set of races, perhaps it is time to give up on the idea. The fuzziness of boundaries between races does not, however, necessarily mean that races do not exist. Many real categories have fuzzy boundaries. In their book Heredity, Race and Society, the evolutionary biologists L.C. Dunn and Theodosius Dobzhansky pointed out that ‘By looking at a suburban landscape one cannot always be sure where the city begins and the countryside ends, but it does not follow from this that the city exists only in imagination.’ Similarly, just ‘because the dividing lines between races are frequently arbitrary’ so we should not conclude that ‘races are imaginary entities’. Races, they argued, ‘exist regardless of whether we can easily define them or not.’

Among many non-human animals subspecies are often separated by a continuous gradation rather than by a sharp boundary. The herring gull (Larus argentatus), for instance, has a dove-grey back in northern Scotland. As you move eastwards around the Arctic, the gull’s back gets darker until, by the time you return to Western Europe, it has a charcoal-grey back and is classed as a different subspecies (the lesser black-backed gull, or Larus fuscus). There are, therefore, two subspecies but, travelling eastwards, it is impossible to say where one ends and the other begins. Human races, say the race realists are, like gull subspecies, ‘fuzzy sets’ – groups with imprecise boundaries.

In their book Race: The reality of human differences, anthropologist Vincent Sarich and journalist Frank Miele suggest a ‘simple answer to the objection that races are not discrete, blending into one another as they do’:

They’re supposed to blend into one another, and categories need not be discrete.  It is not for us to impose our cognitive difficulties upon Nature; rather we need to adjust them to Nature.

Humans might want everything neatly parcelled up and clearly labelled.  But nature is not like that.  And we just have to get used to the messiness of natural divisions. In any case, recent genetic studies suggest that it is possible to divide up humanity into a number of major groups that closely resemble commonsense concepts of race. Consider, for instance, the study by Noah Rosenberg and his colleagues that showed that the difference between races accounts for as little as 3-5 per cent of total human variation.  The same study also showed that it is nevertheless possible – in fact quite easy – to distinguish genetically between races.

Rosenberg and his colleagues studied 377 DNA sequences from 1056 individuals spread across 52 populations worldwide using a computer programme called structure. Structure takes any set of data, and attempts to find a rational way of dividing it into as many groups as it is asked to. In this study, structure was asked to divide up the populations of the world (represented by the 52 DNA samples) into two, three, four and five groups according to how similar or dissimilar were their DNA sequences. When the scientists asked the computer to divide the population of the world into two groups, one group comprised of DNA samples from Africa, Europe and western Asia and the second group of samples from eastern Asia, Australia and the Americas. When the DNA data was divided into three groups, the group consisting of populations from Eastern Asia and the Americas remained unchanged. But the populations of sub-Saharan Africa were separated from those of Europe and Western Asia. In other words, the three groups were the populations of sub-Saharan Africa, those of Europe and Western Asia, and those of Eastern Asia, Australia and the Americas.  When asked to create four groups, structure created a new group by separating the populations of eastern Asia and the Americas. And when asked to break the data into five groups, structure kept all the other groups as they were but separated off the populations of Australasia from the rest of Asia.

There are two things remarkable about these findings.  First, the computer programme divides the population of the world according to the continent on which they live, and as we move from two to five groups, the boundaries of the continents become ever more distinct.  Second, when the world’s populations are divided into five groups, those five groups correlate closely with what we commonly call ‘races’: Africans, Caucasians, East Asians, Australasians and Native Americans. And all this from DNA sequences in which only 4 per cent of total human variation is apportioned out among the races. Rosenberg’s study seems to suggest that, however small the differences between races, they are nevertheless sufficient to pick them out. When asked to pick out six groups, however, the sixth group was the Kalash, a small group from present-day Pakistan, an arbitrary category if there ever was any. In other words, the issue is not as straightforward as might first appear.

Is race a biological reality?

The traditional arguments against the race concept do not, then, exclude the possibility that races exist. So, does science really tell us that race is a biological reality? Infuriatingly, perhaps, for those used to science providing straightforward answers to straightforward questions, it does not. While science does not close the door on the idea of race, it certainly does not open it either.

To see why, let us look at the arguments that race is a biological reality.Today, with a few exceptions, race realists reject (at least in principle) the idea that there are essential, unbridgeable, unchangeable differences between human populations, or that differences signify inferiority or superiority. So, if races are not unchangeable groups with fixed properties, what are they? Most contemporary race realists attempt to define a race in terms of geography or of ancestry, or, more commonly, in terms of both. Such definitions fall into three broad types.

1. ‘A race is a geographically and genetically distinct population’

The first claim defines a race as a ‘geographically and genetically distinct population’. It is a concept derived from Ernst Mayr’s ‘population’ definition of race. A race, the biologist Alice Brues writes, is ‘a division of a species which differs from other divisions by the frequency with which certain hereditary traits appear among its members’. Races are distinguished from each other, not because they possess unique, fixed genetic features, but because one differs from another statistically in the frequencies of particular alleles. It does not matter what genes differ, or how many, or to what degree, just that some do to some extent.

The trouble with this definition, though, is that if we were to test for enough genes, we could find a statistical difference between virtually any two populations. As Cavalli-Sforza explains,

Our experiments have shown that even neighbouring populations (villages or towns) can often be quite different from each other… The maximum number of testable genes is so high that we could in principle detect, and prove to be statistically significant, a difference between any two populations however close geographically or genetically. If we look at enough genes, the genetic distance between Ithaca and Albany in New York or Pisa and Florence in Italy is most likely to be significant, and therefore scientifically proven.  .

Cavalli-Sforza adds that while ‘the inhabitants of Ithaca and Albany might be disappointed to discover that they belong to separate races’, the ‘people in Pisa and Florence might be pleased that science had validated their ancient mutual distrust by demonstrating their genetic differences.’

If any population in the world can be defined as a ‘race’, then the concept becomes meaningless. As Cavalli-Sfroza puts it in his understated way, ‘classifying the world’s population into several hundreds of thousands or a million different races’, is ‘impractical’. The anthropologist Vincent Sarich responds that ‘it is for Nature to tell us’ what is a ‘reasonable’ number of races. But if the people of Ithaca and Albany are to be treated as distinct races, then Nature is probably telling us to trash this particular definition of race.

2. ‘A race is defined by the continent of origin’

To define a race, therefore, it is insufficient for two populations to be geographically separate and genetically distinct. Race realists require an additional means of affirming that the peoples of Europe and sub-Saharan Africa are distinct races but those of Ithaca and Albany are not. The second common definition of race suggests that ancestry might provide just such an additional, independent distinguishing mechanism. Many race realists suggest that a race is defined as the continent from which your ancestors originated.

When humans first came out of Africa around 60,000 years ago, they embarked on a series of complex migrations that took them across the globe. The first group of migrants probably set off from the Horn of Africa, walking along the Arabian coastline, then the coastline of India through to south east Asia. It is thought that a few then cast off in primitive boats, perhaps from the Indonesian archipelago, eventually making landfall in Australia.

A second major migration took a band of the original Africans into the Middle East and then through to the steppes of central Asia. From here some moved south into the Indian subcontinent; some moved west into Europe (where they were joined by another group of migrants who had entered Europe via the Middle East); some turned south-east into what is now southern China; while another group moved north-east into Siberia and eventually, by crossing the Bering Straits, began the peopling of the Americas. Other groups, it is thought, made it into the Americas by following the coastline of the China and Japan north to Siberia and then on to the Bering Straits.

This is, of course, a highly simplified picture of what was an astonishingly complex set of migrations. Some of that complexity is, however, encoded in our genes. Simply by chance, the bands that left Africa would have had slightly different genetic profiles to those who did not make the journey, as well as to those who made different journeys across the globe. On each journey, the travellers would have picked up new genetic mutations which would be present in the newly established populations, but not in the original populations whence they came, or among those who made different journeys. And since people tended to mate with those close to them, rather with those from distant populations, so these genetic differences would have been preserved locally, passed on from generation to generation.

Defining someone by their continent of origin is really to establish in which of the first major migrations their ancestors took part. For instance, to say that someone has African ancestry is to say that his or her ancestors did not make the journey out of Africa. To describe somebody as a ‘Pacific Islander’ is to suggest that their ancestors made that very first journey along the coastline of Arabia and Asia, and then across the sea to Australia. We have seen that about four per cent of total human variation comprissd differences between the major Continental groups. That four percent is a reflection of the genetic differences between the various bands who made those original journeys.

But this too is an inadequate definition of a ‘race’. ‘Geographical origins do not in themselves constitute races’, the philosopher Naomi Zack points out. ‘If all the people identified as white had ancestors alive in Europe at the same time that the people who are identified as black had ancestors alive in Africa, to say that these are racial ancestral differences adds no new information to the data on time and place.’

Race realists might argue that a Continental group is a race – that is how a race is defined. But this is to say something trivial about which there could be no debate. For the definition of a race to be non-trivial two questions need to be answered. What is it about Continental groups that distinguishes them as races? And why should Continental groups, as opposed to other population groups, be defined as races? Because, Neil Risch suggests ‘genetic differentiation is greatest when defined on a continental basis’. And, he argues, such differentiation is significant because many illness and diseases appear to be racially distributed.

In fact the greatest genetic differentiation is not between Continental groups but between Africans and non-Africans. A number of studies have shown that Caucasians, East Asians, Native Americans and Pacific Islanders are closer to each other genetically than any of these groups are to sub-Saharan Africans.

At the same time there is considerable genetic differentiation within Continental groups. And when it comes to illness and diseases, such differentiation is often far more important. Different populations certainly show different patterns of disease and disorder. North Europeans are more likely to suffer from cystic fibrosis than other groups. Tay Sachs, a fatal disease of the central nervous system, particularly affects Ashkenazi Jews. Beta blockers appear to be less effective on African Americans than on those of European descent.

Yet race is not necessarily a good guide to disease. We all know, for instance, that sickle cell anaemia is a black disease. Except that it isn’t. In the USA, the presence of the sickle cell trait can help distinguish between those with, and without, African ancestry. But not in South Africa. In South Africa, neither blacks nor whites are likely to possess the trait. Sickle cell is not a black disease, but a disease of populations originating in areas with high incidence of malaria. Some of these populations are black, some are not. There are four distinct sickle cell haplotypes (a haplotype is a set of linked genes) two of which are found in equatorial Africa, one in parts of southern Europe, southern Turkey, and the Middle East, and one in central India. The majority of people in Africa, including those in southern Africa, do not suffer from sickle cell disease.

Most people know, however, that African Americas suffer disproportionately from the trait. And, given popular ideas about race, most people automatically assume that what applies to black Americans applies to all blacks and only to blacks. It is the social imagination, not the biological reality, of race that turns sickle cell into a black disease.

Each Continental group possesses a genetic profile slightly distinct from other Continental groups, the consequence of different early human migrations (though compared to the genes they possess in common, those differneces are miniscule). But Continental groups represent neither the greatest degree of genetic differentiation within humankind, nor necessarily the most useful way of dividing up human populations. It is an arbitrary choice, not a scientific necessity, to define Continental groups as races.

3. ‘A race is an extended family’

As a result of these problems in defining a race, some race realists have bitten the bullet and accepted that race is effectively genealogy. ‘A member of race R’, the philosopher Max Hocutt argues ‘is an individual whose forebears were members of race R. Thus an animal is a coyote if it is descended from a coyote… A human being is an Afro-American if she is descended from Americans whose forebears were Africans.’ Or as Steve Sailer, founder of the self-styled Human Biodiversity Institute (which despite its grand title, is not an academic centre but a website and email discussion group) , puts it, ‘a race is an extended family that is inbred to some degree’. Hocutt accepts that ‘we cannot say with precision how big, how cohesive or how closed a breeding group must be or even how long it must last to count as a distinct race.’ But this is immaterial, he claims, for what he calls the ‘workaday definition of race’.

In fact it is anything but immaterial. After all, British Jews are ‘an extended family and inbred to some degree’. So are French Jews. So are people from Sylhet in Bangladesh who have emigrated to Britain. Those who have emigrated to Canada form another ‘extended family inbred to some degree’. Presumably then, British Jews and French Jews are separate races; and British Sylhettis and Canadian Sylhettis each form a distinct race. The philosopher Naomi Zack points out that ‘there is no coherent explanation of what makes one population, such as inhabitants of sub-Saharan Africa, a race, while another breeding group, such as Protestants in Ireland, would fail to be considered a race.’ For Steve Sailer that is no problem: Northern Ireland Protestants, he argues, are a distinct race!

Once again we come back to the old problem: when virtually any group can be a race, then the concept of race becomes meaningless. If everything from the British royal family to the entire human population can be considered a race (because each is an ‘extended family inbred to some degree’), then the category has little value.

Max Hocutt acknowledges that ‘the workaday concept of race is too crude either to have much value for the science of molecular biology or to serve as the basis of preferential government policies’, but believes that ‘it does not follow and it is not true that the concept of race is either meaningless or devoid of objective basis.’ The concept of race, he suggests is something ‘Population geneticists can do without; social scientists and the rest of us cannot.’ So, a race is a ‘historical entity’ that molecular biologists and population geneticists ‘can do without’. But it is also a natural category with an ‘objective basis’ in human biology. Curiouser and curiouser, as Alice might have said.

The problem for race realists today is the very opposite of that for nineteenth century racial scientists. Then racial scientists ‘knew’ the significance of race but could find no way of defining differences. ‘Race in the present state of things is an abstract conception’, wrote Paul Broca, the leading physical anthropologist of the late nineteenth century, ‘a conception of continuity in discontinuity, of unity in diversity. It is the rehabilitation of a real but directly unobtainable thing.’

Even the staunchest advocates of racial science despaired of establishing race as a real, physical entity. Every ‘scientific’ measure of racial type, from headform to blood group, was shown to be changeable and not exclusive to any one group. As racial scientists searched desperately for more and more trivial manifestations of race, the biologist WJ Solas noted, apparently without a hint of irony, that ‘it is on the degree of curliness or twist in the hair that the most fundamental divisions in the human race are based.’

Today, as numerous genetic studies reveal, we can clearly define differences between populations. But the significance of such differences no longer seems clear. Race only appears to have any validity if we are willing to be deliberately vague as to what constitutes a race, and what racial differences mean.

Any scientific classification must possess three properties. First, there must be consistent and unique principles of classification. So, when biologists order the living world, the rules they use to define humans (Homo sapiens) as a species are the same as the rules they use to define chimpanzees (Pan troglodytes) as a species. Second, the categories must be mutually exclusive. A chimpanzee cannot belong to two distinct species. And third, a classification system must be complete and able to absorb even those entities not yet identified. If we discover a new species we can slot it into the system we use to classify all other known species.

Racial classifications possess none of these properties. Races are difficult to define and there are no objective rules for deciding what constitutes a race or to what race a person belongs. People can belong to many races at the same time. You can be an Icelander, a European and a Caucasian at the same time. Of course, in the classification of the natural world, the same animal can be a chimpanzee, a mammal and a vertebrate. But the species Pan troglodytes, the class Mammalia and the phylum Chordata (which includes all animals with backbones) occupy different levels of the taxonomic hierarchy; each is a distinct classificatory unit. Icelander, European and Caucasian, on the other hand, are all considered by race realists to be the same kind of classificatory unit – a ‘race’.

And, finally, new races are not ‘discovered’ and slotted into the existing classification system; they are ‘created’ by carving up the classification system in a different way. Consider, for instance, the racial categories used in the US census. In 1977, the US government established four racial categories for the census: American Indian or Alaskan Native; Asian or Pacific Islander; Black; and White. Twenty years later the categories were revised by the addition of a fifth race – ‘Native Hawaiian or other Pacific islander’ created by splitting the ‘Asian or Pacific Island’ category into two. This was not because a new race had been discovered, but because social changes had required new forms of identity.

In the absence of a scientific classification of race, geneticists and anthropologists are forced to import the racial categories we use in everyday life. But everyday categories are both uncertain and contradictory. When we ordinarily talk about human differences, we are often vague about the terms we use. We may talk interchangeably about races, cultures, ethnic groups, or populations. We generally refer to whites or Europeans rather than Caucasians even though many Caucasians are neither white nor European. On the other hand, we use the term blacks and Africans interchangeably, even though there are many blacks who are not African.

All this may not matter if we are having a casual conversation. It does matter if we want to use race as a scientific category. What is striking about scientific papers which deploy racial categories is the contrast between the tightness and technical quality of the language when the authors are discussing genes, diseases and physiological processes and the looseness of the language about racial differences.

Take, for instance, a much-quoted paper in the New England Journal of Medicine that made the case for ‘The importance of race and ethnic background in biomedical research and clinical practice’. Published in 2003, the paper tried to demonstrate the ways in which genes responsible for disease vary across races:

Factor V Leiden, a genetic variant that confers an increased risk of venous thromboembolic disease, is present in about 5 per cent of white people. In contrast, this variant is rarely found in East Asians and Africans… Susceptibility to Crohn’s disease is associated with three polymorphic genetic variation in the CARD 15 gene in whites; none of these genetic variants were found in Japanese patients with Crohn’s disease. Another important gene that affects a complex trait is CCR5 – a receptor used by the human immunodeficiency virus (HIV) to enter cells. As many as 25 per cent of white people (especially in northern Europe) are heterozygous for the CCR5-delta32 variant, which is protective against HIV infection and progression, whereas this variant is virtually absent in other groups, thus suggesting racial and ethnic differences in protection against HIV…

NAT2 [is] an enzyme involved in the detoxification of many carcinogens and the metabolism of many commonly used drugs. Genetic variants of NAT2 result in two phenotypes, slow and rapid acetylators. Population-based studies of NAT2 and its metabolites have shown that the slow acetylator phenotype ranges in frequency from approximately 14 per cent among East Asians to 34 per cent among black Americans to 54 per cent among whites… One of the best known examples of a gene that affects a complex disease is APOE. A patient harboring a variant of this gene, APOE e4 has a substantially increased risk of Alzheimer’s risk. APOE e4 is relatively common and is seen in all racial and ethnic groups, albeit at different frequencies, ranging from 9 per cent in Japanese populations to 14 per cent in white populations to 19 per cent in black American populations.

What is striking about this passage is the contrast between the tightness and technical quality of the language when the authors are discussing genes, diseases and physiological process and the looseness of the language about racial differences. The paper specifies the genes – or rather the alleles – exactly: CARD15CCR5-delta32;APOE e4. No geneticist could confuse one with another. Similarly descriptions of diseases (venous thromboembolic disease, Crohn’s disease), explanations of the consequence of allelic variation (‘Genetic variants of NAT2 results in two phenotypes, slow and rapid acetylators’) and physiological illustrations (‘CCR5 – a receptor used by the human immunodeficiency virus (HIV) to enter cells’) are all specific and all make use of technical language.

Descriptions of population differences, on the other hand, are entirely non-technical and often vague and confusing. Among descriptions used for population groups are ‘whites’, ‘white people’, ‘white people (especially in northern Europe)’, ‘white populations’, ‘East Asians’, ‘Japanese’, ‘Japanese populations’, ‘Africans’, ‘black Americans’, and ‘black American populations’. These are not scientific categories but the language of the saloon bar translated into a scientific idiom.

The categories used in racial studies are often a horrible mishmash of groups that do not belong with each other. So, we are told that whites with Crohn’s disease possess three alleles of the CARD 15 gene none of which are found in Japanese patients. Whites, a group defined by skin colour, are compared to the Japanese, a national group defined by geographic origin. The slow acetylator phenotype that results from a particular variant of the NAT2 gene ‘ranges in frequency from 14 per cent among East Asians to 34 per cent among black Americans to 54 per cent among whites’. The three groups being compared here are a Continental group (East Asian), an admixed group that, in race realist terms, reveals both African and Caucasian ancestry but is socially defined as ‘black’ (black Americans) and a group with a particular phenotype (whites). These very different categories are treated as equivalent groups.

Imagine a zoologist studying a particular behaviour, say hunting. And imagine this zoologist comparing the hunting behaviour of dogs, reptiles and hairy animals. The study would yield no useful information because the comparison groups are not equivalent. Dogs are a particular species of the class Mammalia; some dog breeds are hunters, other are not. Reptiles form a class taxonomically equivalent to mammals comprising many species. ‘Hairy’ is a description of physical appearance that applies to some dogs, but to no reptiles. Most people would agree that comparing the behaviour of dogs, reptiles and hairy animals would not be particularly useful because they are such different kinds of categories. The same is true of comparisons of diseases between East Asians, white people and black Americans.

Even social scientists, who are generally forced to use more ambiguous concepts than those used by natural scientists, would balk at these kinds of comparisons. If an economist compared productivity rates among whites, black Americans and East Asians, it is unlikely that any reputable journal would publish the study. Nor if a sociologist compared attitudes to crime among ‘white people (especially from northern Europe)’ and ‘other racial groups’. Yet, such comparisons are common in genetic studies of populations differences – studies that one would expect to have a stricter methodology than econometric or sociological surveys. Even what appear to be equivalent kinds of groups in racial studies may not be so. It is impossible to know, for instance, whether ‘whites’, ‘white people’ and ‘white populations’ refer to the same population group; ‘white people (especially in northern Europe)’ clearly does not. Do ‘East Asians’, ‘Japanese’, ‘Japanese populations’ refer to equivalent populations? It is difficult to know. No wonder that one survey of medical papers concluded that ‘terms used for race are seldom defined and race is frequently employed in a routine and uncritical manner to represent ill-defined social and cultural factors.’

Why is the character of race in scientific research so ambiguous?  Because race is a social category but one which can have biological consequences. There is no such thing as a ‘natural’ human population. Migration; intermarriage; war and conquest; forced assimilation; voluntary embrace of new or multiple identities whether religious, cultural, national, ethnic or racial; any number of social, economic, religious, and other barriers to interaction (and hence to reproduction); social rules for defining populations such as the ‘one drop rule’ in America – these and many social other factors impact upon the character of a group and transform its genetic profile. That is why racial categories are so difficult define scientifically.

There is no such thing as a ‘natural’ human population. Yet, many of the ways in which we customarily group people socially – by race, ethnicity, nationality, religious affiliation, geographic locality and so on – are not arbitrary from a biological point of view. Members of such groups often show greater biologically relatedness than two randomly chosen individuals. Such groups have often been ghettoized by a coercive external authority, or have chosen to self-segregate from other groups. Hence they are inbred to a certain degree and can act as surrogates, however imperfectly, for biological relatedness. Categories such as ‘African American’, ‘people of Asian descent’ and ‘Ashkenazi Jew’ can be important in medical research not because they are natural races but because they are social representations of certain aspects of genetic variation. They can become means of addressing questions about human genetic differences and human genetic commonalities.

This is why race can sometimes be what the psychiatrist Sally Satel calls a ‘poor man’s clue’ in medicine: not because races are natural divisions of humankind but because investigating socially defined populations can provide a practical means of dividing humans into groups that show different degrees of biological relatedness. But it is a rough and ready process because there exists only a rough and ready relationship between social groups and natural populations. How rough and how ready depends on the particular group and the particular question we are asking. As we saw with sickle cell anaemia, the ways in which society customarily divides populations may not be the most useful in medical research. Race and ethnicity can be surrogates for biological relatedness, in other words, but not necessarily good ones. ‘Deciphering the relationships that may exist between social classifications and biological categories’, the anthropologists Morris Foster and Richard Sharp point out, ‘is not a simple matter’:

The biological significance that a social distinction may have for one purpose can dissolve when those same social categories are used to answer other biological questions. Thus, it may be appropriate to use social categories as a proxy for biological relatedness (or unrelatedness) in some circumstances but not in others.

An individual can have a number of social identities some of which may be important to the research at hand, and some of which are irrelevant. An individual donating DNA might be simultaneously a resident of a particular Indian village in Arizona, a member of the Hopi tribe, a descendant of a Laguna tribal family, a Native American, and someone of Spanish ancestry, as well as an American citizen. Each of these identities, Morris and Foster observe, tells a different social story about the individual and leads to a different scientific perspective on genetic variation. Researchers, in other words, should not assume a priori that the world is naturally divided into a set of ‘races’. Rather, depending on the particular questions they are asking they have to decide which of the socially-given populations are most useful to sample.

The importance occasionally of group differences in medicine does not reveal the reality of race. Indeed, what we popularly call races are generally least suited to genetic research. That is because the degree of biological relatedness in Continental groups is barely greater than in a randomly chosen group of people. That is what we mean when we say that just 4 per cent of total human variation exists between the major Continental groups. Races are, however, socially significant and a major way by which we divide up our societies. It may make social sense, therefore, for researchers and clinicians to use race as the basis by which they divide up the population.

Races are not natural divisions of humankind. But socially defined populations provide, nevertheless, a rough and ready means of dividing humans into groups that show different degrees of biological relatedness. The irony is that in order to study human genetic diversity, scientists need socially defined categories of difference. The real question we have to ask ourselves today is not so much why people imagine race to be a valid biological category as why so many believe it to be a valid social category, and why society continues to define people by race.

The debate about race is not a debate about whether differences exist between human populations. Jon Entine, a staunch defender of the idea of race, defines race as ‘human biodiversity’. That is meaningless. No one, on either side of the debate, would deny that there are a myriad of differences between different human populations.

The real debate about race is not whether there are any differences between populations, but about the significance of such differences. The fact that a BMW saloon is of a different colour to a Boeing 747 is of little significance to most people. The fact that one has an internal combustion engine and the other a jet engine is of immense consequence if you want to travel from London to New York. But if you are a Yanomamo Indian living in the Amazon forest, even this difference may not be of that great an import, since it is quite possible that you will be unable – or will not need – to use either form of transport. If we want to understand the significance of any set of differences, in other words, we have to ask ourselves two questions: Significant for what? And in what context? One of the problems of the contemporary debate about race is that these two questions get too rarely asked.


  1. Gabriel Andrade

    “Strange Fruit” is a wonderful book. Out of all arguments against the existence of race, I think the killer argument is the arbitrariness in the selection of racial traits: why divide humanity according to skin color and not, say, according to attached earlobe frequency?
    This, however, poses a difficulty that, as far as I know, is not frequently explored: if a white boss must make cuts and fires, say, 25 black employees and 0 white employees, is he a racist? We may see a pattern in his population of fired employees: they all have dark skin color. But, then again, why choose skin color as the parameter to establish the pattern of the fired employees? Maybe, if we choose foot size (or blood type, or whatever other possible trait), we would realize that the fired population is actually racially balanced. One may say that, inasmuch as skin color is the socially established criteria of race definition, we may presume the boss is racist. But, how do we know that, for the boss, skin color is indeed relevant? If in his record we see a pattern of injustice against dark-skinned people, perhaps we could also see a pattern of justice if we choose some other trait. It will all depend on what trait we choose.
    Racial quotas require a percentage of black TV presenters. But, there is no percentage of, say, obese people. Now, obesity clearly has a genetic basis, so, in a sense, we may speak of a race of ‘fat’ people, as much as we can speak of a race of ‘black’ people (true, obesity is conditioned by environmental influences, but, to a lesser degree, so is skin color). If we choose obesity (instead of skin color), we may say that TV is still racist against fat people. Again, it all depends on what racial trait we choose; but, precisely, the selection of racial traits is arbitrary. Again, one may say that even if races do not exist, they are socially relevant, and thus, racial quotas are necessary, because even if there is no such thing as a race, there is indeed such a thing as racism. But, then again, how do we know that skin color is today more relevant than, say, weight?
    Anyways, just disorganized thoughts…

  2. There is no significant difference between ‘races’. The whole debate is like a flat earth debate (struggling here to maintain my patience – obviously not with you but with the debate). While your post/speech is very comprehensive I wonder if people who try to establish this as a reality are not just prejudiced and trying to ‘see’ what they already ‘believe’? Is it, as we call it in Ireland, just spitting into the wind? Well done for trying, though.

  3. SIMON

    Another excellent piece by Malik. The point, however, is that the whatever the truth of this argument it has NO political implications whatsoever. An individual is an individual, and is to be treated as such in a non-racist society. Every single race-realist I “know” is trying to link their views on this subject to immigration or other policy decisions.

  4. A nice review. I only want to point out a serious error in Malik’s quantitative reasoning. It is not his fault; the interpretation given by Malik is widespread among population geneticists. Malik says

    “First, 85% of human genetic variation exists within populations; less than ten per cent distinguishes what are commonly called races.”

    Many people think this means the differences between races are small. Malik then says:

    “Imagine that some nuclear nightmare wiped out the entire human race apart from one small population – say, the Masai tribe in East Africa. Almost all the genetic variation that exists in the world today would still be present in that one small group.”

    Readers might take this to mean that the rest of humanity is genetically almost redundant.

    Let’s look at what it really means.

    Variation is usually measured by heterozygosity, though Lewontin used entropy. Suppose we have two groups belonging to completely distinct species, with no alleles shared between groups at the locus under study. These groups couldn’t be more different. For definiteness suppose there are ten equally common alleles at this locus in each group (so twenty alleles altogether), and suppose each group is equally large. Then the within-group heterozygosity is 0.9, and the total heterozygosity is 0.95. So for these two distinct species, with no shared genes, 95% (0.9/0.95) of the variation is within species; only 5% of the variation is between-species. Yet these are two distinct species which share no genes at this locus!!!

    A similar, though less severe, problem arises when Shannon entropy is used instead of heterozygosity, so Lewontin’s analysis also does not really show that differences between races are small or inconsistent.

    Now let us apply Malik’s nuclear disaster to one of my two species. It is true that after the disaster, the surviving group contains 95% of the original total variation present in the groups. But the disaster has eliminated half the alleles! The eliminated group is not redundant at all, but just as rich and unique as the surviving group (remember, the groups share no alleles).

    Mathematically, the correct way to measure the magnitude of the genetic differences between races is to use genuine measures of diversity, such as the exponential of Shannon entropy. These obey the “replication principle” of economists and ecologists. This property ensures that if we eliminate one of the two equally large, equally diverse groups which share no alleles, the diversity drops my exactly 50%. This is the true measure of the impact of Malik’s nuclear disaster.

    To find the differentiation between groups, we should use true diversity measures and we should partition these correctly into independent within-and between-group components (the correct partitioning of heterozygosity is not additive, contrary to common practice in biology). My articles in the ecology and population genetics literature explain the mathematics of diversity and differentiation. In Google Scholar search for [Jost diversity] for a long list of articles, and criticisms of my articles. See especially Jost 2008, Gst and its relatives do not measure differentiation, Molecular Ecology 17: 4015-4026.

    In the comments under Jerry’s blog, John Harshmann raised a related issue. He claimed that since one migrant per generation was enough to prevent divergence of populations due to drift, human races should not show much divergence (since migration between most groups surely exceeds one individual per generation). This argument is part of the same population genetics lore as the statements about variation. It is based on the misinterpretation of Gst or Fst as actual measures of genetic divergence between groups. Gst is the between-group heterozygosity divided by total heterozygosity. In my example above, with two completely distinct groups that share no genes at all, Gst is very low, about 0.05, which would usually be interpreted to indicate little differentiation between groups (clearly a false conclusion). The idea that one migrant per generation is enough to prevent divergence at neutral loci is based on analysis of Gst. When there are significantly more than one migrant per generation, Gst is close to zero. But as we just saw, Gst can be close to zero even if the groups are completely diverged (no shared genes). So it is wrong to think that because there are more than one migrant per generation., divergence is necessarily low. Low Gst does not equal low divergence.

    A derivation in my 2008 Mol Ecol article shows that the quantity which controls divergence between a pair of groups at a given locus is the relative migration rate divided by the mutation rate. The relative migration rate (number of migrants divided by population size) might very well be small between many groups of humans.

    The percentage of private alleles mentioned by Malik is also a somewhat fragile statistic for these purposes. Most geneticists will agree with me, I think. The slightest migration will remove the “privateness” of most alleles. What is important is the degree of differentiation of allele frequencies between demes, not the mere presence or absence of certain alleles.

    I have no strong opinion about the data, I just want people to use valid arguments when discussing it. It is possible that if the analysis were done correctly, the conclusions would not change much. But we should do that analysis, rather than repeating analyses based on comparison of within-group variation to total variation.

    • Many thanks for this. My point in raising Lewontin’s (and Rosenberg et al’s) studies was to suggest that they were, in of themselves, insufficient to rule out the idea of race. That is why I concluded that section:

      From a genetic point of view poodles and greyhounds are almost identical, as are dachshunds and St Bernards. Tiny genetic differences can lead to major bodily and behavioural changes. Humans share about 99.4 per cent of our functional genes with chimpanzees. Yet we are clearly very different species…The fact that race accounts for only 4 per cent of genetic variation among humans does not, race realists suggest, necessarily mean that race has no biological validity.

  5. It seems this is descending in an argument about mere semantics; because after reading all that I still think I am am race-realist, but I think K.M. is too. So let me explain how I understand the word “race”:

    While the only *sharp* genetic distinction is between human and non-human, our species still contains genetic clusters. If we don’t at too much detail, those clusters correspond very well to inuitive notions of race. Things become less clear when you look in more detail, and we can always find examples where the word race refers to something else.

    But tables and chairs disappear when looked at under the microscope, and those words are sometimes used for things which are not even pieces of furniture. So it seems races are about a real as tables and chairs.

    • I am a ‘race realist’ only if race realism means accepting that a ‘race is a social category but one which can have biological consequences’. I don’t know of any race realists who would accept that.

  6. Hi Kenan,

    Thank you for writing this–I’ve long appreciated your careful review of such matters. However, I would like to point out that although Jonathan Marks calls his position a “rant,” it actually represents an anthropological view that seems quite close to your own. The basics of that view are:

    1) Humans do vary biologically. That biological variation is real and important.
    2) Traditional race categorizations are not a useful way to describe this real biological variation.
    3) However, the social categorization of race has been and continues to be tremendously real and important, with biological consequences.

    These issues are discussed at my blog-post titled Race redux: What are people “tilting against”?.

    While I was researching these issues, I was introduced (at the suggestion of anthropologist Henry Harpending) to the work of geneticist Guido Barbujani. His 2010 paper (co-authored with Vincenza Colonna) is a very careful overview of the scientific literature from someone who has been studying human genetic diversity for a long time. I know Lou Jost above has some disagreements with it, but Barbujani is an acknowledged expert in the field: Human genome diversity: frequently asked questions.

    My one quibble with the above is in your discussion of the software program Structure. It would be interesting to add in what happened when Structure was asked to create 6 groups–the Kalash then popped out as a separate cluster. And so on. (I quote Jonathan Marks on the results at an earlier blog-post called Race is a social construction which tries to outline exactly what anthropologists meant when they were using that shorthand). Barbujani and Colonna also discuss the aftermath of Rosenberg’s work, as the clusters were not quite so clear following that initial and most famous announcement:

    “In fact, clustering is certainly possible, but is not consistent across studies. In a subsequent paper based on a larger sample of microsatellites the same authors [Rosenberg et al.] rejected the claim that the distribution of samples in space itself accounts for the apparent population differentiation, but failed to confirm the Kalash as a separate unit; instead, the native American populations were this time split in two clusters. The Kalash resurfaced as a distinct group when 15 Indian populations were added to the analysis, leading to the identification of 7 clusters, with most populations of Eurasia now showing multiple memberships. In these studies, all African genotypes formed a single group, in contrast to broadly replicated evidence of deep population subdivision within Africa. However, when the CEPH 377-marker dataset was analyzed by a different method that searches for zones of sharp genetic change or genetic boundaries, Africa appeared subdivided in four groups, and each American population formed an independent group, giving a total of 11” (Barbujani and Colonna 2010:289).

    • Yes, I do disagree with Barbujani and Colonna’s use and interpretation of Fst and Gst as a measure of population structure that is relevant to this discussion. I am not an anthropologist but a mathematical population geneticist; but this is actually a matter of mathematics, not anthropology.

      Their explanation of Fst, “Fst ranges from 0, when all subpopulations are identical, to 1, when different alleles are fixed in different subpopulations” is just false, as anyone can check. It can approach zero even if all subpopulations are completely different (share no alleles), and it can equal unity even when almost all groups are fixed for the same allele. Their statement is a common misconception found throughout the population genetics literature. Their statement is only true for the two-allele, two-subpopulation case.

      Their statement that the absolute number of migrants per subpopulation per generation (N*m) determines whether the set of subpopulations diverges or converges due to drift is also false. I know this latter belief is widespread among population geneticists, but it is the relative migration rate that matters, as I showed in my 2008 paper cited below. For some relevant simulations see my comments under Nolan Kane’s post here:

      Also, to be clear, it is only these invalid methods that I dispute. The authors’ other arguments might be fine, and their conclusions may well be correct anyway.

      For the math underlying my statements, see
      Jost L (2008) Gst and its relatives do not measure differentiation. Molecular Ecology, 17, 4015-4026.

    • Jason, thanks for this. I know Jon Marks, he has shaped my thinking on these issues. I take you point about structure.

      Thanks, too, to Lou. I will have to sit down and think through your disagreements with Barbujani and Colonna more carefully.

  7. Chuck


    “Since 1972 other researchers have confirmed that 85 per cent of variation exists within a population. The results of a 2002 study by Noah Rosenberg and his colleagues were even more striking. They showed that differences among individuals account for a staggering 93-95 percent of all genetic variation…..
    …The importance occasionally of group differences in medicine does not reveal the reality of race. Indeed, what we popularly call races are generally least suited to genetic research. That is because the degree of biological relatedness in Continental groups is barely greater than in a randomly chosen group of people.”

    Your arguments seem to largely rest on the above claims. Unfortunately for them, independent of Lou Jost’s point, they’re false. Within population genetic variation is split between individuals and within individuals. So if 85% of the genetic variation was, in fact, within populations, the variation between individual within a population would be 43 or so %. One would then compare this 43% to the genetic variation between individuals of different populations. So the genetic variation between individual of the same population to that between individuals of very different populations would be 15% to 43%, a substantially higher ratio. The claim is also rather misleading. For example, a 15% between population variation can be rather significant. Imagine two equally sized populations that have equal standard deviations for some trait. In this scenario, a between population variation of 15% would be equivalent to a standardized difference of .8. An effect size of .8 is often interpreted as large. Surely it’s not insignificant.

    “Once again we come back to the old problem: when virtually any group can be a race, then the concept of race becomes meaningless.”

    This really depends on context. Obviously, by Sailer’s definition any group can’t be a race. Only extended kinships can (i.e., genetic populations). While since this concept defines no boundaries, it’s taxonomically worthless, since it defines a content (i.e., genetic relatedness), it’s not scientifically meaningless. Its meaning comes from qualifying the generic term ‘population.

  8. Chuck

    “But this too is an inadequate definition of a ‘race’. ‘Geographical origins do not in themselves constitute races’, the philosopher Naomi Zack points out. ‘If all the people identified as white had ancestors alive in Europe at the same time that the people who are identified as black had ancestors alive in Africa, to say that these are racial ancestral differences adds no new information to the data on time and place.’

    Race realists might argue that a Continental group is a race – that is how a race is defined. But this is to say something trivial about which there could be no debate. For the definition of a race to be non-trivial two questions need to be answered. What is it about Continental groups that distinguishes them as races? And why should Continental groups, as opposed to other population groups, be defined as races?”

    None of this makes sense. If “race” is defined as “geographic populations” then the term’s utility would be that it’s shorthand for “geographic populations.” It’s true that this would — or should — preclude debate about the definition of ‘race,’ but it clearly would not preclude other debate about ‘race.’ One could debate, for example, whether or not there were average socially significant genetic differences between the races, so defined.

    And you ask “And why should Continental groups, as opposed to other population groups, be defined as races?” What does this mean? “Race” is a term. Terms refer to concepts. The term “race” can be used to refer to all sort of concepts, such as “a contest of speed.” We might ask: “And why should contests of speed, as opposed to other contests, be defined as races?” Well, there are all sorts of reasons, starting with that this is how such contests have been traditionally defined.

  9. “We know, for instance, that Europeans are more likely to have blue eyes than Africans, and more likely to be taller than East Asians.”

    Actually it is more like “Presently, Europeans are more likely to be taller than East Asians”

Comments are closed.

%d bloggers like this: