Home About My Books Résumé Chronological World of Wonders

Sampling Fossils

Created/Modified: 2014-10-09/2019-10-30

Sometime in the late 80's or early 90's I read an article in Nature that argued we should expect "first discovery" dates in the sciences that involve digging up fossils to take periodic leaps backward.

The argument is compelling: because fossilization and preservation of fossils are rare processes, they sparsely sample the living population. Under these circumstances the odds of us discovering the earliest of any instance of a thing are very low.

As near as I can tell, people who work in these fields don't read Nature, because the argument still doesn't seem to have much currency. So I thought it would be fun to put together a simple computational example that roughly models the human population. I made most of the numbers up, but the results are nicely illustrative of the phenomenon.

Consider a simple model where the human population grows exponentially from two people (call them "Madam" and "Steve") a million years ago to three million people 12,000 years ago (which is the best estimate for the pre-agricultural human population)

Fitting this to an exponential, this gives us an e-folding time of about 70,000 years, so the population looks like 2*exp(years/70178) with time zero being a million years ago and 12000 years ago being time 998000.

Now assume that fossil preservation is a linear ramp that starts with zero probability a million years ago and ramps up to a one-in-ten-billion chance 12000 years ago: 1E-10*years/12000

If we instantiate this as a few lines of Python and randomly sample the mean number of preserved fossils each year, we get the following code:

import math
import random
fA = 0.998E6/math.log(1.5E6)
print fA
def g(fT): # population growth
    return 2*math.exp(fT/fA)
def f(fT): # fossil preservation
    return fT*1E-10/998000
# sample from a million years ago to 12000 years ago
for nI in range(0, 998000):
    fAvg = f(nI)*g(nI) # average number of fossilized remains
    if random.random() < fAvg: # actual number
      print 1e6-nI, fAvg # date of first discovery (years past)
      break

Running this code produces dates for the oldest discovered fossils that vary nicely around the age of the oldest know fossil for anatomically modern humans, which is about 195,000 years.

But here is the distribution of results, which is unsurprisingly broad.

...

The red line that peaks around 200,000 years ago is for a fossil preservation probability of one in ten billion, the green line that peaks around 300,000 years ago is for a fossil preservation probability of one in a billion. Or rather, those are fossil preservation and discovery probabilities, as a fossil that has not been discovered may as well not exist in this calculation.

It should be clear on this basis that to assert, as the Wikipedia page linked above does, that "Anatomically modern humans evolved from archaic Homo sapiens in the Middle Paleolithic, about 200,000 years ago" when the oldest known anatomically modern human fossils date from 195,000 years ago is to reject some pretty basic mathematics. However many anatomically modern humans there were to begin with, the number was bound to be small, so the odds of the earliest discovered fossil being dated to a few thousand years after the evolution of a species is very small.

That "about 200,000 years ago" should really read, "sometime before 200,000 years ago, quite likely earlier than 500,000 years ago" unless the number of modern human fossils is really very high starting 200,000 years ago. And it is not: we find the earliest anatomically modern human fossils only in a few sites dating from 75,000-130,000 years before present (Klasies River Mouth), 92,000 ybp (Qafzeh) and 90,000 ybp (Skhul). So we are genuinely dealing with a sparsely sampled population.

On this basis we can and should expect that as we search more diligently for anatomically modern human remains, we will find "surprising" leaps back into the distant past. It wouldn't shock me, based on this simplistic model, if our earliest anatomically modern ancestors were a million years old. It would, however, shock the people who work in this field, just as they were apparently shocked by the Omo remains, which took things back a mere hundred thousand years earlier than previously known.

This is a all a bit sad. As I said, I read an article about this phenomenon in Nature over twenty years ago. We should have robust models of fossil survival probabilities and discovery probabilities now, which would allow us to use the actual data to make valid Bayesian inferences about how old our earliest ancestors are likely to have been, rather than claims that 195,000 year old fossils support origin dates of 200,000 years ago, when in fact they almost certainly make far older dates much more plausible, and dates of less than 300,000 years before present really quite unlikely.

The belief that "the earliest fossils are a reliable guide to the date of the earliest humans" is simply not supported by basic mathematics under some fairly plausible assumptions.

To claim otherwise is to claim either that the early human population wasn't growing much over hundreds of thousands of years, but became almost immediately high and stayed high for the rest of pre-history, or it is to claim that fossil preservation and discovery is a common process, neither of which seems very plausible. In reality the human population fluctuated a lot, we think, but it almost certainly didn't get large and stay about the same level for hundreds of thousands of years.

Although the model presented here is exceedingly simple-minded, it significantly increases the plausibility of the proposition that early anatomically modern humans existed for much longer than would be naively inferred.

As sampling becomes less sparse--as we move more closely toward the present day--this phenomenon will become smaller, but it still wouldn't shock me if our evidence for the peopling of North America turns out to underestimate the antiquity of that process, for example.

To refute the argument I'm making here requires that one create a plausible model of population growth, and fossil preservation and discovery that allows in the typical case for the date of the earliest discovered fossil to be very close to the date of earliest appearance of the fossilized species. I can't think of such a thing, but perhaps someone else can, which would be fascinating indeed.

Contact Home World of Wonders
Copyright (C) TJ Radcliffe