pertinent references as of the late 1970s), the method is termed “rarefaction” because

it involves reducing or rarefying a sample to a smaller size. Figure 4.9 illustrates the

basic procedure and outcome of rarefaction in two ways.

sampling, recovery, and sample size 161

figure 4.9. Two models of the results of rarefaction. (a) histogram with high white bars

of 100 percent sample and black lower bars of rare¬ed (60 percent) sample; (b) rarefaction

curve (compare with Figure 4.8) showing 100 percent sample and corresponding 60 percent

rare¬ed sample.

Ecologists have been grappling with rarefaction for decades “ its various forms

and how to make results more valid (e.g., Colwell and Coddington 1994; Colwell

2004; Colwell et al. 2004; Gotelli and Colwell 2001 ; Scheiner 2003; Schoereder et al.

2004; Smith et al. 1985; Wolda 1981 ). Zooarchaeologists have been aware of the basic

rarefaction procedure for more than 20 years (Styles 1981 ), although few individuals

have used it (see Lyman and Ames 2007 for references). Paleobiologists are also

aware of the method, and they have devoted considerable effort to developing and

quantitative paleozoology

162

perfecting it (e.g., Alroy 2000; Barnosky et al. 2005; Bush et al. 2004; Miller and Foote

1996). Early efforts to develop standard species area curves for paleozoology (Koch

1987) have not been pursued, probably because general patterns are too general to

be of predictive value.

Given that the basic rarefaction procedure involves reducing a sample to a smaller

size, it is not surprising that as the statistical sophistication of scientists increased

and access to electronic computing power increased in the 1970s, programs were

written explicitly to perform rarefaction analysis. The best known of these among

zooarchaeologists is one designed by Kintigh (1984). This procedure sums all avail-

able samples in order to model taxonomic abundances in the population, and then

draws random samples of various sizes from that modeled population. Richness is

determined multiple times for each sample size, and a mean richness and con¬dence

levels thereof are calculated for each sample size. Finally, the procedure generates not

only a best-¬t line (mean) through the sample data sets but also con¬dence intervals

for the line in graphic form. This rarefaction program has been used by zooarchae-

ologists to compare faunas of different sizes (e.g., McCartney and Glass 1990). The

resulting model approximates the effects of varying sample size on richness and is

designed to test the null hypothesis that all samples (of whatever size) were derived

from the same population, and thus to identify samples that are not members of the

population but are instead (statistical) outliers. An outlier is a sample that seems not

to have been drawn from the same population as all others because it falls far above or

below the richness expected given its size; an outlier is a sample that, probabilistically,

could not have been drawn from the modeled population.

Several seldom acknowledged assumptions and problems attend rarefaction. Early

on, Grayson (1984:152) noted that the rarefaction method in general as originally

developed by Sanders (1968) and later perfected by Tipper (1979) used quantitative

units that were statistically independent of one another; it used individual animals.

No similar quantitative unit is available for paleozoology. One might use MNI, but

these values are dependent on aggregation; one might use NISP, but these values are

likely interdependent to some unknown degree.

Rhode (1988) noted that if one uses Kintigh™s (1984) procedure (and null hypoth-

esis) then one is assuming that a great deal is already known about the population

being investigated. In particular, such use assumes that the samples used to generate

the rarefaction curve are, when summed, representative of taxonomic richness as

manifest in the population of interest and, more importantly, that their sum is also

representative of the distribution of individuals across taxa (known as taxonomic

evenness). Using the sum of all samples to generate a rarefaction curve such as in

Figure 4.9b may result in the inclusion of samples that are not members of the (target)

sampling, recovery, and sample size 163

population; if the samples derive from different populations, their sum will represent

a sample of organisms derived from those multiple populations. The statistical effect

of including all samples is to produce expected richness values for various sample

sizes that have been in¬‚uenced by one or more samples that may not actually be part

of the same (target) population (the same holds for taxonomic evenness). Differences

between a nonmember sample and the model generated from all samples including

the nonmember would be muted to some unknown degree (see also Byrd 1997).

As Rhode (1988:711 “712) astutely observes, if a particular sample used to model the

target population seems to differ signi¬cantly from that modeled population, how

can “the choice of that population as the comparative baseline be justi¬ed?”

As Kintigh (1984) originally noted, the key step in his rarefaction procedure involves

the de¬nition of the population; in particular, which samples are to be included when

summing samples to create the population model? Producing an answer to this

question is where the assumption that we already know much about the population

we are studying comes into play. Analytical means of evaluating whether samples

of different sizes might have been derived from the same underlying population are

discussed later in this chapter.

On the one hand, Buzas and Hayek (2005) recently de¬ned “within-community

sampling” as drawing >1 sample from a population with a particular frequency

distribution (set of taxonomic abundances) or constant value of a variable of inter-

est. “Between-community sampling,” on the other hand, involves drawing the >1

samples from populations with different frequency distributions or the same distri-

bution with different values of a variable of interest. The distinction could be used as

a basis for lumping two samples (they are statistically indistinguishable with respect

to the property of interest) or for not lumping two samples (they are statistically

distinct).

The preceding returns us to the question of what constitutes the target variable?

If it is NTAXA in a biological community, how is the community de¬ned (see Chap-

ter 2)? If it is the taxa exploited by human occupants of an archaeological site, the

differences between the thanatocoenose, taphocoenose, and identi¬ed assemblage

must be kept in mind (Figure 2.1 ). This volume is not the place to explore these

issues. Rather, it is relevant to illustrate how analysts have studied and analytically

used the generic species“area relationship. To do that in the following, it is assumed

that the target variable is NTAXA within the identi¬ed assemblage. This simpli¬ca-

tion allows us to focus on the species“area relationship and methods of investigating

it, although it is important to note that the relationship may well be found to exist

between any measure of sampling effort or sample size and any target variable (rich-

ness, evenness, heterogeneity).

quantitative paleozoology

164

Species“Area Curves Are Not All the Same

In preceding pages, techniques to explore relationships manifest by species“area

curves have been mentioned (e.g., Figures 4.1 “4.4 and 4.8“4.9, and associated dis-

cussion). At this juncture it must be made clear that species“area curves do not all

express the same relationships or have the same implications with respect to the rela-

tionship between sample size and NTAXA. This is so because they are constructed

differently, and they are constructed differently because they have different analyti-

cal purposes and address different analytical questions. To demonstrate this, in the

following the data in Table 4.2 are used to construct three different kinds of what are

generically known as species“area curves. (A portion of this section is derived from

Lyman and Ames [2007].)

One kind of species“area curve is shown in Figure 4.2. In this curve samples increase

in size by being added together and thus are statistically interdependent. This kind

of species“area curve is a sampling to redundancy curve. The particular curve in

Figure 4.2 has leveled off, suggesting that all of the information in the last couple

samples (identities of the mammalian genera present) is redundant with information

provided by earlier (smaller) samples. If the curve had not leveled off, such as is the

case in Figure 4.3, then new samples are still adding new information so there is

no empirical basis to argue that we have sampled to redundancy. The sampling to

redundancy curve can be plotted manually by simply connecting points, or it can

be drawn statistically (Lepofsky and Lertzman 2005). A sampling to redundancy

curve has a very narrow analytical purpose “ to determine if increases in sample

size (accomplished by summing samples) in¬‚uence the target variable; its utility is

that it provides an empirical indication of sample adequacy in the form of a static

value for the target variable across samples of varied sizes that comprise one total

collection. Constructing a species“area curve of the sampling to redundancy kind is

straightforward, but remember that the order of sample addition will in¬‚uence the

ultimate sample size at which the curve levels off (see the discussion of Figures 4.2

and 4.3).

Many species“area curves have been constructed in one of two ways distinctly

different from how a sampling to redundancy curve is built. They are distinct because

they have different analytical purposes. Some of those other curves were constructed

to compare statistically independent samples of different sizes (e.g., McCartney and

Glass 1990); some were constructed from statistically independent samples derived

from one population in order to predict representative statistically independent

sample sizes drawn from other populations (e.g., Zohar and Belmaker 2005); some

were used to determine or compare rates of increase in richness (slope of the curve)

sampling, recovery, and sample size 165

(e.g., Grayson 1998); some were constructed by rarifying samples (reducing their

sizes probabilistically) (e.g., Styles 1981 ). How were the other curves constructed?

One way that species“area curves are constructed involves generating bivariate

plots of statistically independent samples, and then statistically ¬tting a curve to

the plot to determine if sample size may be in¬‚uencing the target variable across the

different samples. The example in Figure 4.4 uses the six annual samples from the

Meier site described in Table 4.2. The best-¬t regression line de¬ned by the point

scatter is included. The correlation and the regression line are statistically signi¬cant

( p < 0.01 ) and suggest that NTAXA per statistically independent annual sample is

a function of sample size measured as NISP. If each point represented a sample from

a different stratum or different site, Figure 4.4 would suggest those samples were

strongly in¬‚uenced by sample size, and thus NTAXA values for those samples should

not be compared.

Figure 4.4 does not allow us to surmise if our total sample from the Meier site is

representative of taxonomic richness (compare with Figure 4.2); the kind of curve in

Figure 4.4 has a different analytical purpose and utility. The protocol of building a

species“area curve exempli¬ed in Figure 4.4 is sometimes referred to as the “regres-

sion approach” (Leonard 1997). The name re¬‚ects the statistical analysis performed.

Regression analysis ascertains the strength of the relationship between samples of dif-

ferent sizes (in Figure 4.4, NISP values) and a target variable (in Figure 4.4, NTAXA).

The strength of the relationship is re¬‚ected by the magnitude and statistical signi¬-

cance of the correlation coef¬cient. If there is a signi¬cant correlation between sample

size and the target variable, then the magnitudes of the target variable could be a

result of sample sizes rather than a property of interest. With respect to Figure 4.4,

taxonomic richness varies according to sample size. Therefore, if these samples had

come from different strata or sites, we would not want to conclude something like the

sample from the 1991 site/stratum is taxonomically richer than the 1990 site/stratum,

so the people who deposited the remains in the 1991 site/stratum had greater diet

breadth than those who deposited the 1990 site/stratum materials. Remembering that

correlations do not necessarily imply a causal relationship between two variables, our

inference regarding diet breadth might be correct, but it might not. The regression

approach is merely a way to detect those instances when caution is advisable.

If the regression approach prompts the conclusion that sample-size effects may be

present in a set of samples, the analyst has options. The samples can be pooled and a

rarefaction analysis performed, if one is willing to make the necessary assumptions.

Alternatively, slopes of lines describing the relationship between sample size and the

target variable may vary across different sets of samples (see Chapter 5). Compar-

isons of slopes may reveal a property of the compared sets of samples not otherwise

quantitative paleozoology

166

figure 4.10. Rarefaction curve (solid line) and 95 percent con¬dence intervals (dotted

lines) of richness of mammalian genera based on six annual samples from the Meier site

(black squares). Data from Table 4.2; curves determined using Holland™s (2005) Analytical

Rarefaction.

detectable that is free of sample-size effects. A third possibility is to identify statistical

outliers, or samples that fall signi¬cant distances (usually ≥ 2 standard deviations)

from the regression line (Grayson 1984). Ascertaining why samples fall far from the

regression line may reveal a unique property of those unusual assemblages not oth-

erwise perceived and that is free of sample-size effects. Study of slopes and of outliers

avoids one weakness of the regression approach. Small samples may in fact be 100

percent samples or populations (Rhode 1988), and thus the sample-size effect is an

artifact of the size of the populations from which the samples derive.

The third way that species“area curves are constructed involves rarefaction (e.g.,

Sanders 1968; Tipper 1979). Rarefaction has been used by zooarchaeologists for some

time (e.g., Byrd 1997; McCartney and Glass 1990; Styles 1981 ). There are several

ways to construct rarefaction curves, but describing them is beyond my scope here.

It suf¬ces to say that one can use statistically independent samples or statistically

interdependent (summed) samples (or sample without replacement, or sample with

replacement) to estimate NTAXA were a sample of a particular size. A rarefaction

curve constructed using the six annual samples from the Meier site is shown in

Figure 4.10. To generate this curve, Holland™s (2005) Analytical Rarefaction software

sampling, recovery, and sample size 167

was used. If the six samples from Meier were independent of one another and from

different strata or sites, the rarefaction curve would allow comparison of NTAXA

across assemblages of different size without fear of sample size differences driving the

results. As noted earlier, the rarefaction procedure assumes the included samples all

derive from the same population, and it also assumes that specimens used to provide

(NISP) values for drawing the curve are independent of one another. In Figure 4.10,

we know the samples all derive from the same population (the Meier site), and thus

we also know the specimens are to some degree interdependent.

The three kinds of species“area curves shown in Figures 4.2, 4.4, and 4.10 are

not very similar in general appearance despite the similarities in the variables used

to build them. They are not very similar because each curve is meant to address a

distinct analytical question, so each has been built in a unique, distinctive way. The

sampling to redundancy approach (Figure 4.2) determines if one total collection

represents the value of the target variable. Regression analysis (Figure 4.4) allows

detection of possible sample size effects on the target variable among independent

samples of different size. Rarefaction (Figure 4.10) allows two or more samples of

different sizes to be compared as if they were the same size by reducing the larger

samples to a common small size.

N E S TE D N E S S

There is an analytical means of evaluating whether samples of different sizes might

have been derived from the same underlying population. The analytical technique was

developed by biogeographers studying insular faunas such as those on archipelagos

or island chains (see Brown and Lomolino [1998] for details). They reasoned that the

faunas on land-bridge islands (those once connected to the mainland when sea levels

were low) likely originated on the mainland, and given the species“area relationship,

islands “ which have varied but relatively small land areas “ would have subsets of the

taxa found on the mainland “ which have large land areas relative to islands. Further,

small islands would have smaller subsets of taxa “ have lower NTAXA values “ than

would large islands. Islands can be oceanographic, or they can be habitat islands

surrounded not by water but habitats unfavorable to the taxa located in the insular

habitat patch. The pattern of organismal distribution “ presence/absence of taxa

across the islands “ is referred to as the “nested subset pattern” (Patterson and Atmar

1986; see also Cutler 1994; Patterson 1987; Wright et al. 1998).

The concept of a nested subset pattern is straightforward. Figure 4.11 shows both

a perfectly nested set of faunas, and a poorly nested set of faunas, in two graphic

quantitative paleozoology

168

figure 4.11. Examples of perfectly nested faunas and poorly nested faunas. (a) perfectly

nested set of faunas, each capital letter represents a unique species; (b) poorly nested set of

faunas, each capital letter represents a unique species; (c) Venn diagram of three perfectly

nested faunas in which the larger the circle, the greater the number of taxa; (d) Venn diagram

of three imperfectly nested faunas in which the larger the circle, the greater the number of

taxa. (a) after Cutler (1994); (c) and (d) after Patterson (1987).

forms. Table 4.5 shows a perfectly nested set of faunas and a poorly nested set of

faunas in tabular form. In the perfectly nested sets, taxa absent from one fauna are

also absent from all smaller faunas, and taxa present in a fauna are also present in all

larger faunas. In poorly or weakly nested faunas, some taxa may occur unexpectedly

in small faunas and large faunas but not in midsized faunas, and other taxa may

not occur in large faunas but occur in midsized or small ones. The unexpected

occurrences are “outliers” whereas the unexpected absences are “holes” in the nested

pattern (Cutler 1991 ).

The extremes of nestedness are easy to tell apart (Figure 4.11 , Table 4.5). What

about intermediate cases? Can we determine if one set of faunas is more nested than

another? Biogeographers have developed quantitative ways to measure exactly how

nested a set of faunas is, and thus one can compare the nestedness of multiple sets

of faunas (e.g., Cutler 1991 ). Atmar and Patterson (1993) refer to their algorithm for

measuring the degree of nestedness as a means to measure an archipelago™s “heat of

disorder” or “temperature.” The algorithm measures the degree of nestedness on a

scale of zero to 100 degrees; faunas that are perfectly nested have a temperature of

0—¦ whereas faunas that display no nestedness whatsoever are 100—¦ . (The 100 degrees

are an arbitrary interval-scale measure of amount of nestedness.) The value of the

nestedness concept is great because, theoretically, nestedness provides an indication

of whether two or more faunas derive from the same population. In a way, the

examination of nestedness is like rarefaction without rarefying; it compares samples

rather than sum them and rarify the sum.

Atmar and Patterson™s (1993) thermometer of nestedness provides a measure

of whether multiple faunal (island) samples derive from the same underlying

sampling, recovery, and sample size 169

Table 4.5. Two sets of faunal samples showing (a) a perfectly nested set of faunas and

(b) a poorly nested set of faunas. +, taxon present; “, taxon absent. (b) was generated

with a table of random numbers

Assemblage Taxon A B C D E F G H I J

a. Nested

+ + + + + + + + + +

I

+ + + + + + + + +

II “

+ + + + + + + +

III “ “

+ + + + + + +

IV “ “ “

+ + + + + +

V “ “ “ “

+ + + + +

VI “ “ “ “ “

+ + + +

VII “ “ “ “ “ “

+ + +

VIII “ “ “ “ “ “ “

+ +

IX “ “ “ “ “ “ “ “

+

X “ “ “ “ “ “ “ “ “

b. Not nested

+ + +

I “ “ “ “ “ “ “

+ + + + + +

II “ “ “ “

+ + + +

III “ “ “ “ “

+ + + + + +

IV “ “ “ “

+ + + +

V “ “ “ “ “ “

+ + + +

VI “ “ “ “ “ “

+ + + + +

VII “ “ “ “ “

+ + +

VIII “ “ “ “ “ “ “

+ + + +

IX “ “ “ “ “ “

+ + + + + + +

X “ “ “

(mainland) population. If the faunas are strongly nested, then it is probable that

the samples derive from the same population, and one might perform a rarefaction

analysis using those faunas lumped together (assuming quantitative units are inde-

pendent). If faunas are weakly nested, then one could argue that either the samples

are so small as to either not accurately re¬‚ect the heterogeneity of the population or

the samples derive from different populations. How strong must nestedness be, or

how weak? That is dif¬cult to answer. But the point is that the nestedness thermome-

ter provides a measure that constitutes information bearing on the answer. And a

well-informed decision is likely to be better than one that is poorly informed.

The nestedness diagram of the 18 assemblages from eastern Washington State

generated by Atmar and Patterson™s (1993) thermometer is shown in Figure 4.12.

quantitative paleozoology

170

figure 4.12. Nestedness diagram of eighteen assemblages of mammalian genera from

eastern Washington State. Note that the NISP per assemblage and the rank order of the

assemblages are strongly correlated (Spearman™s rho = 0.812, p < 0.0001).

That ¬gure suggests there is some nestedness among the faunas. This set of faunas

has a nested “temperature” of 18.23 —¦ , a value that suggests there is indeed some

nestedness (0—¦ is perfectly nested), but the faunas are hardly perfectly nested. In

conjunction with the facts that NISP and NTAXA values per assemblage for this

set of assemblages are strongly correlated (Figure 4.4), and that the order of nested

faunas produced by the nestedness thermometer is strongly correlated with NISP per

assemblage (rho = 0.812, p < 0.0001), it seems reasonable to conclude that all eighteen

assemblages derive from the same population of mammals. The assemblages merely

differ in size (= NISP), and that difference is the major variable that is creating

taxonomic differences between them.

The value of the nestedness concept, however it is determined (and there are

several ways to do so; compare Cutler [1991 ] with Atmar and Patternson [1993]), is

great. If one grants the assumption that a small fauna should approximate a random

sample of a large fauna, then when comparing two or more faunas of different sizes,

if all faunas derive from the same population, they should be nested. The nestedness

concept takes advantage of not only the relationship between sample size and NTAXA

(say), but the taxonomic composition of the faunas. Rarefaction does as well, but it

effectively begins with the assumption that the faunal samples are all from the same

population. The nestedness concept and the techniques for measuring nestedness

allow that assumption to be tested and evaluated empirically. Given that the concept

has been discussed in the ecological literature for more than two decades, and given

the near ubiquitous concern with sample size issues among paleozoologists, it is a bit

surprising that nestedness has not been used by paleozoologists with some frequency.

Indeed, I am aware of only one instance of a paleozoologist using it (Jones 2004).

sampling, recovery, and sample size 171

CO N C L U S I O N

There is a particularly telling example in the recent literature that highlights the lack

of interdisciplinary contact. Leonard (1987) mentioned the sampling to redundancy

approach 20 years ago in the archaeology literature. That technique was mentioned,

if not used very often, in the zooarchaeological literature several times since then

(e.g., Lyman 1995a; Monks 2000; Reitz and Wing 1999). A recent analysis by paleon-