<< . .

. 18
( : 35)

. . >>

lection) of the population of interest. As Tipper (1979) states in his terse history (with
pertinent references as of the late 1970s), the method is termed “rarefaction” because
it involves reducing or rarefying a sample to a smaller size. Figure 4.9 illustrates the
basic procedure and outcome of rarefaction in two ways.
sampling, recovery, and sample size 161

figure 4.9. Two models of the results of rarefaction. (a) histogram with high white bars
of 100 percent sample and black lower bars of rare¬ed (60 percent) sample; (b) rarefaction
curve (compare with Figure 4.8) showing 100 percent sample and corresponding 60 percent
rare¬ed sample.

Ecologists have been grappling with rarefaction for decades “ its various forms
and how to make results more valid (e.g., Colwell and Coddington 1994; Colwell
2004; Colwell et al. 2004; Gotelli and Colwell 2001 ; Scheiner 2003; Schoereder et al.
2004; Smith et al. 1985; Wolda 1981 ). Zooarchaeologists have been aware of the basic
rarefaction procedure for more than 20 years (Styles 1981 ), although few individuals
have used it (see Lyman and Ames 2007 for references). Paleobiologists are also
aware of the method, and they have devoted considerable effort to developing and
quantitative paleozoology

perfecting it (e.g., Alroy 2000; Barnosky et al. 2005; Bush et al. 2004; Miller and Foote
1996). Early efforts to develop standard species area curves for paleozoology (Koch
1987) have not been pursued, probably because general patterns are too general to
be of predictive value.
Given that the basic rarefaction procedure involves reducing a sample to a smaller
size, it is not surprising that as the statistical sophistication of scientists increased
and access to electronic computing power increased in the 1970s, programs were
written explicitly to perform rarefaction analysis. The best known of these among
zooarchaeologists is one designed by Kintigh (1984). This procedure sums all avail-
able samples in order to model taxonomic abundances in the population, and then
draws random samples of various sizes from that modeled population. Richness is
determined multiple times for each sample size, and a mean richness and con¬dence
levels thereof are calculated for each sample size. Finally, the procedure generates not
only a best-¬t line (mean) through the sample data sets but also con¬dence intervals
for the line in graphic form. This rarefaction program has been used by zooarchae-
ologists to compare faunas of different sizes (e.g., McCartney and Glass 1990). The
resulting model approximates the effects of varying sample size on richness and is
designed to test the null hypothesis that all samples (of whatever size) were derived
from the same population, and thus to identify samples that are not members of the
population but are instead (statistical) outliers. An outlier is a sample that seems not
to have been drawn from the same population as all others because it falls far above or
below the richness expected given its size; an outlier is a sample that, probabilistically,
could not have been drawn from the modeled population.
Several seldom acknowledged assumptions and problems attend rarefaction. Early
on, Grayson (1984:152) noted that the rarefaction method in general as originally
developed by Sanders (1968) and later perfected by Tipper (1979) used quantitative
units that were statistically independent of one another; it used individual animals.
No similar quantitative unit is available for paleozoology. One might use MNI, but
these values are dependent on aggregation; one might use NISP, but these values are
likely interdependent to some unknown degree.
Rhode (1988) noted that if one uses Kintigh™s (1984) procedure (and null hypoth-
esis) then one is assuming that a great deal is already known about the population
being investigated. In particular, such use assumes that the samples used to generate
the rarefaction curve are, when summed, representative of taxonomic richness as
manifest in the population of interest and, more importantly, that their sum is also
representative of the distribution of individuals across taxa (known as taxonomic
evenness). Using the sum of all samples to generate a rarefaction curve such as in
Figure 4.9b may result in the inclusion of samples that are not members of the (target)
sampling, recovery, and sample size 163

population; if the samples derive from different populations, their sum will represent
a sample of organisms derived from those multiple populations. The statistical effect
of including all samples is to produce expected richness values for various sample
sizes that have been in¬‚uenced by one or more samples that may not actually be part
of the same (target) population (the same holds for taxonomic evenness). Differences
between a nonmember sample and the model generated from all samples including
the nonmember would be muted to some unknown degree (see also Byrd 1997).
As Rhode (1988:711 “712) astutely observes, if a particular sample used to model the
target population seems to differ signi¬cantly from that modeled population, how
can “the choice of that population as the comparative baseline be justi¬ed?”
As Kintigh (1984) originally noted, the key step in his rarefaction procedure involves
the de¬nition of the population; in particular, which samples are to be included when
summing samples to create the population model? Producing an answer to this
question is where the assumption that we already know much about the population
we are studying comes into play. Analytical means of evaluating whether samples
of different sizes might have been derived from the same underlying population are
discussed later in this chapter.
On the one hand, Buzas and Hayek (2005) recently de¬ned “within-community
sampling” as drawing >1 sample from a population with a particular frequency
distribution (set of taxonomic abundances) or constant value of a variable of inter-
est. “Between-community sampling,” on the other hand, involves drawing the >1
samples from populations with different frequency distributions or the same distri-
bution with different values of a variable of interest. The distinction could be used as
a basis for lumping two samples (they are statistically indistinguishable with respect
to the property of interest) or for not lumping two samples (they are statistically
The preceding returns us to the question of what constitutes the target variable?
If it is NTAXA in a biological community, how is the community de¬ned (see Chap-
ter 2)? If it is the taxa exploited by human occupants of an archaeological site, the
differences between the thanatocoenose, taphocoenose, and identi¬ed assemblage
must be kept in mind (Figure 2.1 ). This volume is not the place to explore these
issues. Rather, it is relevant to illustrate how analysts have studied and analytically
used the generic species“area relationship. To do that in the following, it is assumed
that the target variable is NTAXA within the identi¬ed assemblage. This simpli¬ca-
tion allows us to focus on the species“area relationship and methods of investigating
it, although it is important to note that the relationship may well be found to exist
between any measure of sampling effort or sample size and any target variable (rich-
ness, evenness, heterogeneity).
quantitative paleozoology

Species“Area Curves Are Not All the Same

In preceding pages, techniques to explore relationships manifest by species“area
curves have been mentioned (e.g., Figures 4.1 “4.4 and 4.8“4.9, and associated dis-
cussion). At this juncture it must be made clear that species“area curves do not all
express the same relationships or have the same implications with respect to the rela-
tionship between sample size and NTAXA. This is so because they are constructed
differently, and they are constructed differently because they have different analyti-
cal purposes and address different analytical questions. To demonstrate this, in the
following the data in Table 4.2 are used to construct three different kinds of what are
generically known as species“area curves. (A portion of this section is derived from
Lyman and Ames [2007].)
One kind of species“area curve is shown in Figure 4.2. In this curve samples increase
in size by being added together and thus are statistically interdependent. This kind
of species“area curve is a sampling to redundancy curve. The particular curve in
Figure 4.2 has leveled off, suggesting that all of the information in the last couple
samples (identities of the mammalian genera present) is redundant with information
provided by earlier (smaller) samples. If the curve had not leveled off, such as is the
case in Figure 4.3, then new samples are still adding new information so there is
no empirical basis to argue that we have sampled to redundancy. The sampling to
redundancy curve can be plotted manually by simply connecting points, or it can
be drawn statistically (Lepofsky and Lertzman 2005). A sampling to redundancy
curve has a very narrow analytical purpose “ to determine if increases in sample
size (accomplished by summing samples) in¬‚uence the target variable; its utility is
that it provides an empirical indication of sample adequacy in the form of a static
value for the target variable across samples of varied sizes that comprise one total
collection. Constructing a species“area curve of the sampling to redundancy kind is
straightforward, but remember that the order of sample addition will in¬‚uence the
ultimate sample size at which the curve levels off (see the discussion of Figures 4.2
and 4.3).
Many species“area curves have been constructed in one of two ways distinctly
different from how a sampling to redundancy curve is built. They are distinct because
they have different analytical purposes. Some of those other curves were constructed
to compare statistically independent samples of different sizes (e.g., McCartney and
Glass 1990); some were constructed from statistically independent samples derived
from one population in order to predict representative statistically independent
sample sizes drawn from other populations (e.g., Zohar and Belmaker 2005); some
were used to determine or compare rates of increase in richness (slope of the curve)
sampling, recovery, and sample size 165

(e.g., Grayson 1998); some were constructed by rarifying samples (reducing their
sizes probabilistically) (e.g., Styles 1981 ). How were the other curves constructed?
One way that species“area curves are constructed involves generating bivariate
plots of statistically independent samples, and then statistically ¬tting a curve to
the plot to determine if sample size may be in¬‚uencing the target variable across the
different samples. The example in Figure 4.4 uses the six annual samples from the
Meier site described in Table 4.2. The best-¬t regression line de¬ned by the point
scatter is included. The correlation and the regression line are statistically signi¬cant
( p < 0.01 ) and suggest that NTAXA per statistically independent annual sample is
a function of sample size measured as NISP. If each point represented a sample from
a different stratum or different site, Figure 4.4 would suggest those samples were
strongly in¬‚uenced by sample size, and thus NTAXA values for those samples should
not be compared.
Figure 4.4 does not allow us to surmise if our total sample from the Meier site is
representative of taxonomic richness (compare with Figure 4.2); the kind of curve in
Figure 4.4 has a different analytical purpose and utility. The protocol of building a
species“area curve exempli¬ed in Figure 4.4 is sometimes referred to as the “regres-
sion approach” (Leonard 1997). The name re¬‚ects the statistical analysis performed.
Regression analysis ascertains the strength of the relationship between samples of dif-
ferent sizes (in Figure 4.4, NISP values) and a target variable (in Figure 4.4, NTAXA).
The strength of the relationship is re¬‚ected by the magnitude and statistical signi¬-
cance of the correlation coef¬cient. If there is a signi¬cant correlation between sample
size and the target variable, then the magnitudes of the target variable could be a
result of sample sizes rather than a property of interest. With respect to Figure 4.4,
taxonomic richness varies according to sample size. Therefore, if these samples had
come from different strata or sites, we would not want to conclude something like the
sample from the 1991 site/stratum is taxonomically richer than the 1990 site/stratum,
so the people who deposited the remains in the 1991 site/stratum had greater diet
breadth than those who deposited the 1990 site/stratum materials. Remembering that
correlations do not necessarily imply a causal relationship between two variables, our
inference regarding diet breadth might be correct, but it might not. The regression
approach is merely a way to detect those instances when caution is advisable.
If the regression approach prompts the conclusion that sample-size effects may be
present in a set of samples, the analyst has options. The samples can be pooled and a
rarefaction analysis performed, if one is willing to make the necessary assumptions.
Alternatively, slopes of lines describing the relationship between sample size and the
target variable may vary across different sets of samples (see Chapter 5). Compar-
isons of slopes may reveal a property of the compared sets of samples not otherwise
quantitative paleozoology

figure 4.10. Rarefaction curve (solid line) and 95 percent con¬dence intervals (dotted
lines) of richness of mammalian genera based on six annual samples from the Meier site
(black squares). Data from Table 4.2; curves determined using Holland™s (2005) Analytical

detectable that is free of sample-size effects. A third possibility is to identify statistical
outliers, or samples that fall signi¬cant distances (usually ≥ 2 standard deviations)
from the regression line (Grayson 1984). Ascertaining why samples fall far from the
regression line may reveal a unique property of those unusual assemblages not oth-
erwise perceived and that is free of sample-size effects. Study of slopes and of outliers
avoids one weakness of the regression approach. Small samples may in fact be 100
percent samples or populations (Rhode 1988), and thus the sample-size effect is an
artifact of the size of the populations from which the samples derive.
The third way that species“area curves are constructed involves rarefaction (e.g.,
Sanders 1968; Tipper 1979). Rarefaction has been used by zooarchaeologists for some
time (e.g., Byrd 1997; McCartney and Glass 1990; Styles 1981 ). There are several
ways to construct rarefaction curves, but describing them is beyond my scope here.
It suf¬ces to say that one can use statistically independent samples or statistically
interdependent (summed) samples (or sample without replacement, or sample with
replacement) to estimate NTAXA were a sample of a particular size. A rarefaction
curve constructed using the six annual samples from the Meier site is shown in
Figure 4.10. To generate this curve, Holland™s (2005) Analytical Rarefaction software
sampling, recovery, and sample size 167

was used. If the six samples from Meier were independent of one another and from
different strata or sites, the rarefaction curve would allow comparison of NTAXA
across assemblages of different size without fear of sample size differences driving the
results. As noted earlier, the rarefaction procedure assumes the included samples all
derive from the same population, and it also assumes that specimens used to provide
(NISP) values for drawing the curve are independent of one another. In Figure 4.10,
we know the samples all derive from the same population (the Meier site), and thus
we also know the specimens are to some degree interdependent.
The three kinds of species“area curves shown in Figures 4.2, 4.4, and 4.10 are
not very similar in general appearance despite the similarities in the variables used
to build them. They are not very similar because each curve is meant to address a
distinct analytical question, so each has been built in a unique, distinctive way. The
sampling to redundancy approach (Figure 4.2) determines if one total collection
represents the value of the target variable. Regression analysis (Figure 4.4) allows
detection of possible sample size effects on the target variable among independent
samples of different size. Rarefaction (Figure 4.10) allows two or more samples of
different sizes to be compared as if they were the same size by reducing the larger
samples to a common small size.


There is an analytical means of evaluating whether samples of different sizes might
have been derived from the same underlying population. The analytical technique was
developed by biogeographers studying insular faunas such as those on archipelagos
or island chains (see Brown and Lomolino [1998] for details). They reasoned that the
faunas on land-bridge islands (those once connected to the mainland when sea levels
were low) likely originated on the mainland, and given the species“area relationship,
islands “ which have varied but relatively small land areas “ would have subsets of the
taxa found on the mainland “ which have large land areas relative to islands. Further,
small islands would have smaller subsets of taxa “ have lower NTAXA values “ than
would large islands. Islands can be oceanographic, or they can be habitat islands
surrounded not by water but habitats unfavorable to the taxa located in the insular
habitat patch. The pattern of organismal distribution “ presence/absence of taxa
across the islands “ is referred to as the “nested subset pattern” (Patterson and Atmar
1986; see also Cutler 1994; Patterson 1987; Wright et al. 1998).
The concept of a nested subset pattern is straightforward. Figure 4.11 shows both
a perfectly nested set of faunas, and a poorly nested set of faunas, in two graphic
quantitative paleozoology

figure 4.11. Examples of perfectly nested faunas and poorly nested faunas. (a) perfectly
nested set of faunas, each capital letter represents a unique species; (b) poorly nested set of
faunas, each capital letter represents a unique species; (c) Venn diagram of three perfectly
nested faunas in which the larger the circle, the greater the number of taxa; (d) Venn diagram
of three imperfectly nested faunas in which the larger the circle, the greater the number of
taxa. (a) after Cutler (1994); (c) and (d) after Patterson (1987).

forms. Table 4.5 shows a perfectly nested set of faunas and a poorly nested set of
faunas in tabular form. In the perfectly nested sets, taxa absent from one fauna are
also absent from all smaller faunas, and taxa present in a fauna are also present in all
larger faunas. In poorly or weakly nested faunas, some taxa may occur unexpectedly
in small faunas and large faunas but not in midsized faunas, and other taxa may
not occur in large faunas but occur in midsized or small ones. The unexpected
occurrences are “outliers” whereas the unexpected absences are “holes” in the nested
pattern (Cutler 1991 ).
The extremes of nestedness are easy to tell apart (Figure 4.11 , Table 4.5). What
about intermediate cases? Can we determine if one set of faunas is more nested than
another? Biogeographers have developed quantitative ways to measure exactly how
nested a set of faunas is, and thus one can compare the nestedness of multiple sets
of faunas (e.g., Cutler 1991 ). Atmar and Patterson (1993) refer to their algorithm for
measuring the degree of nestedness as a means to measure an archipelago™s “heat of
disorder” or “temperature.” The algorithm measures the degree of nestedness on a
scale of zero to 100 degrees; faunas that are perfectly nested have a temperature of
0—¦ whereas faunas that display no nestedness whatsoever are 100—¦ . (The 100 degrees
are an arbitrary interval-scale measure of amount of nestedness.) The value of the
nestedness concept is great because, theoretically, nestedness provides an indication
of whether two or more faunas derive from the same population. In a way, the
examination of nestedness is like rarefaction without rarefying; it compares samples
rather than sum them and rarify the sum.
Atmar and Patterson™s (1993) thermometer of nestedness provides a measure
of whether multiple faunal (island) samples derive from the same underlying
sampling, recovery, and sample size 169

Table 4.5. Two sets of faunal samples showing (a) a perfectly nested set of faunas and
(b) a poorly nested set of faunas. +, taxon present; “, taxon absent. (b) was generated
with a table of random numbers

Assemblage Taxon A B C D E F G H I J
a. Nested
+ + + + + + + + + +
+ + + + + + + + +
II “
+ + + + + + + +
III “ “
+ + + + + + +
IV “ “ “
+ + + + + +
V “ “ “ “
+ + + + +
VI “ “ “ “ “
+ + + +
VII “ “ “ “ “ “
+ + +
VIII “ “ “ “ “ “ “
+ +
IX “ “ “ “ “ “ “ “
X “ “ “ “ “ “ “ “ “
b. Not nested
+ + +
I “ “ “ “ “ “ “
+ + + + + +
II “ “ “ “
+ + + +
III “ “ “ “ “
+ + + + + +
IV “ “ “ “
+ + + +
V “ “ “ “ “ “
+ + + +
VI “ “ “ “ “ “
+ + + + +
VII “ “ “ “ “
+ + +
VIII “ “ “ “ “ “ “
+ + + +
IX “ “ “ “ “ “
+ + + + + + +
X “ “ “

(mainland) population. If the faunas are strongly nested, then it is probable that
the samples derive from the same population, and one might perform a rarefaction
analysis using those faunas lumped together (assuming quantitative units are inde-
pendent). If faunas are weakly nested, then one could argue that either the samples
are so small as to either not accurately re¬‚ect the heterogeneity of the population or
the samples derive from different populations. How strong must nestedness be, or
how weak? That is dif¬cult to answer. But the point is that the nestedness thermome-
ter provides a measure that constitutes information bearing on the answer. And a
well-informed decision is likely to be better than one that is poorly informed.
The nestedness diagram of the 18 assemblages from eastern Washington State
generated by Atmar and Patterson™s (1993) thermometer is shown in Figure 4.12.
quantitative paleozoology

figure 4.12. Nestedness diagram of eighteen assemblages of mammalian genera from
eastern Washington State. Note that the NISP per assemblage and the rank order of the
assemblages are strongly correlated (Spearman™s rho = 0.812, p < 0.0001).

That ¬gure suggests there is some nestedness among the faunas. This set of faunas
has a nested “temperature” of 18.23 —¦ , a value that suggests there is indeed some
nestedness (0—¦ is perfectly nested), but the faunas are hardly perfectly nested. In
conjunction with the facts that NISP and NTAXA values per assemblage for this
set of assemblages are strongly correlated (Figure 4.4), and that the order of nested
faunas produced by the nestedness thermometer is strongly correlated with NISP per
assemblage (rho = 0.812, p < 0.0001), it seems reasonable to conclude that all eighteen
assemblages derive from the same population of mammals. The assemblages merely
differ in size (= NISP), and that difference is the major variable that is creating
taxonomic differences between them.
The value of the nestedness concept, however it is determined (and there are
several ways to do so; compare Cutler [1991 ] with Atmar and Patternson [1993]), is
great. If one grants the assumption that a small fauna should approximate a random
sample of a large fauna, then when comparing two or more faunas of different sizes,
if all faunas derive from the same population, they should be nested. The nestedness
concept takes advantage of not only the relationship between sample size and NTAXA
(say), but the taxonomic composition of the faunas. Rarefaction does as well, but it
effectively begins with the assumption that the faunal samples are all from the same
population. The nestedness concept and the techniques for measuring nestedness
allow that assumption to be tested and evaluated empirically. Given that the concept
has been discussed in the ecological literature for more than two decades, and given
the near ubiquitous concern with sample size issues among paleozoologists, it is a bit
surprising that nestedness has not been used by paleozoologists with some frequency.
Indeed, I am aware of only one instance of a paleozoologist using it (Jones 2004).
sampling, recovery, and sample size 171


There is a particularly telling example in the recent literature that highlights the lack
of interdisciplinary contact. Leonard (1987) mentioned the sampling to redundancy
approach 20 years ago in the archaeology literature. That technique was mentioned,
if not used very often, in the zooarchaeological literature several times since then
(e.g., Lyman 1995a; Monks 2000; Reitz and Wing 1999). A recent analysis by paleon-

<< . .

. 18
( : 35)

. . >>