46

Examine the performance data, run some inferential statistics, plot the equity

curve, and the system is ready to be traded. Everything is clean and mathemati-

cally unimpeachable. Corrections for shrinkage or multiple tests, worries over

excessive curve-fitting, and many of the other concerns that plague traditional

optimization methodologies can be forgotten. Moreover, with today™s modem

computer technology, walk-forward and self-adaptive models are practical and not

even difficult to implement.

The principle behind walk-forward optimization (also known as walk-for-

ward testing) is to emulate the steps involved in actually trading a system that

requires periodic optimization. It works like this: Optimize the system on the data

points 1 through M. Then simulate trading on data points M + I through M + K.

Reoptimize the system on data points K + 1 through K + M. Then simulate trad-

ing on points (K + M) + 1 through (K + M) + K. Advance through the data series

in this fashion until no more data points are left to analyze. As should be evident,

the system is optimized on a sample of historical data and then traded. After some

period of time, the system is reoptimized and trading is resumed. The sequence of

events guarantees that the data on which trades take place is always in the future

relative to the optimization process; all trades occur on what is, essentially, out-of-

sample data. In walk-forward testing, M is the look-back or optimization window

and K the reoptimization interval.

Self-adaptive systems work in a similar manner, except that the optimization

or adaptive process is part of the system, rather than the test environment. As each

bar or data point comes along, a self-adaptive system updates its internal state (its

parameters or rules) and then makes decisions concerning actions required on the

next bar or data point. When the next bar arrives, the decided-upon actions are car-

ried out and the process repeats. Internal updates, which are how the system learns

about or adapts to the market, need not occur on every single bar. They can be per-

formed at fixed intervals or whenever deemed necessary by the model.

The trader planning to work with self-adapting systems will need a power-

ful, component-based development platform that employs a strong language, such

as Cf f, Object Pascal, or Visual Basic, and that provides good access to third-

party libraries and software components. Components are designed to be incorpo-

rated into user-written software, including the special-purpose software that

constitutes an adaptive system. The more components that are available, the less

work there is to do. At the very least, a trader venturing into self-adaptive systems

should have at hand genetic optimizer and trading simulator components that can

be easily embedded within a trading model. Adaptive systems will be demonstrat-

ed in later chapters, showing how this technique works in practice.

There is no doubt that walk-forward optimization and adaptive systems will

become more popular over time as the markets become more efficient and diffi-

cult to trade, and as commercial software packages become available that place

these techniques within reach of the average trader.

CHAPTER 3 optimizers and OQtimizatio” 47

OPTIMIZER TOOLS AND INFORMATION

Aerodynamics, electronics, chemistry, biochemistry, planning, and business are

just a few of the fields in which optimization plays a role. Because optimization is

of interest to so many problem-solving areas, research goes on everywhere, infor-

mation is abundant, and optimization tools proliferate. Where can this information

be found? What tools and products are available?

Brute force optimizers are usually buried in software packages aimed pri-

marily at tasks other than optimization; they are usually not available on their own.

In the world of trading, products like TradeStation and SuperCharts from Omega

Research (800-292.3453), Excalibur from Futures Truth (828-697-0273), and

MetaStock from Equis International (800-882-3040) have built-in brute force opti-

mizers. If you write your own software, brute force optimization is so trivial to

implement using in-line progranuning code that the use of special libraries or

components is superfluous. Products and code able to carry out brute force opti-

mization may also serve well for user-guided optimization.

Although sometimes appearing as built-in tools in specialized programs,

genetic optimizers are more often distributed in the form of class libraries or soft-

ware components, add-ons to various application packages, or stand-alone research

instruments. As an example of a class library written with the component paradigm

in mind, consider OptEvolve, the C+ + genetic optimizer from Scientific Consultant

Services (516-696-3333): This general-purpose genetic optimizer implements sev-

eral algorithms, including differential evolution, and is sold in the form of highly

portable C+ + code that can be used in UNIXiLINUX, DOS, and Windows envi-

ronments. TS-Evolve, available from Ruggiero Associates (800-21 l-9785) gives

users of TradeStation the ability to perform full-blown genetic optimizations. The

Evolver, which can be purchased from Palisade Corporation (800.432.7475) is a

general-purpose genetic optimizer for Microsoft™s Excel spreadsheet; it comes with

a dynamic link library (DLL) that can provide genetic optimization services to user

programs written in any language able to call DLL functions. GENESIS, a stand-

alone instrument aimed at the research community, was written by John Grefenstette

of the Naval Research Laboratory; the product is available in the form of generic C

source code. While genetic optimizers can occasionally be found in modeling tools

for chemists and in other specialized products, they do not yet form a native part of

popular software packages designed for traders.

Information about genetic optimization is readily available. Genetic algo-

rithms are discussed in many books, magazines, and journals and on Internet

newsgroups. A good overview of the field of genetic optimization can be found in

the Handbook of Generic Algorithms (Davis, 1991). Price and Storm (1997)

described an algorithm for “differential evolution,” which has been shown to be an

exceptionally powerful technique for optimization problems involving real-valued

parameters. Genetic algorithms are currently the focus of many academic journals

and conference proceedings. Lively discussions on all aspects of genetic opti-

mization take place in several Internet newsgroups of which compaigenetic is the

most noteworthy.

A basic exposition of simulated annealing can be found in Numericnl

Recipes in C (Press et al., 1992), as can C functions implementing optimizers for

both combinatorial and real-valued problems. Neural, Novel & Hybrid Algorithms

for Time Series Prediction (Masters, 1995) also discusses annealing-based opti-

mization and contains relevant C+ + code on the included CD-ROM. Like genet-

ic optimization, simulated annealing is the focus of many research studies,

conference presentations, journal articles, and Internet newsgroup discussions.

Algorithms and code for conjugate gradient and variable metric optimiza-

tion, two fairly sophisticated analytic methods, can be found in Numerical Recipes

in C (Press et al., 1992) and Numerical Recipes (Press et al., 1986). Masters (1995)

provides an assortment of analytic optimization procedures in C+ + (on the CD-

ROM that comes with his book), as well as a good discussion of the subject.

Additional procedures for analytic optimization are available in the IMSL and the

NAG library (from Visual Numerics, Inc., and Numerical Algorithms Group,

respectively) and in the optimization toolbox for MATLAB (a general-purpose

mathematical package from The MathWorks, 508-647-7000, that has gamed pop-

ularity in the financial engineering community). Finally, Microsoft™s Excel spread-

sheet contains a built-in analytic optimizer-the Solver-that employs conjugate

gradient or Newtonian methods.

As a source of general information about optimization applied to trading sys-

tem development, consult Design, Testing and Optimization qf Trading Systems by

Robert Pardo (1992). Among other things, this book shows the reader how to opti-

mize profitably, how to avoid undesirable curve-fitting, and how to carry out walk-

forward tests.

WHICH OPTIMIZER IS FOR YOU?

At the very least, you should have available an optimizer that is designed to make

both brute force and user-guided optimization easy to carry out. Such an optimiz-

er is already at hand if you use either TradeStation or Excalibur for system devel-

opment tasks. On the other hand, if you develop your systems in Excel, Visual

Basic, C+ +, or Delphi, you will have to create your own brute force optimizer.

As demonstrated earlier, a brute force optimizer is simple to implement. For many

problems, brute force or user-guided optimization is the best approach.

If your system development efforts require something beyond brute force, a

genetic optimizer is a great second choice. Armed with both brute force and genet-

ic optimizers, you will be able to solve virtually any problem imaginable. In our

own efforts, we hardly ever reach for any other kind of optimization tool!

TradeStation users will probably want TS-Evolve from Ruggiero Associates. The

Evolver product from Palisade Corporation is a good choice for Excel and Visual

Basic users. If you develop systems in C+ + or Delphi, select the C+ + Genetic

Optimizer from Scientific Consultant Services, Inc. A genetic optimizer is the

Swiss Army knife of the optimizer world: Even problems more efficiently solved

using such other techniques as analytic optimization will yield, albeit more slowly,

to a good genetic optimizer.

Finally, if you want to explore analytic optimization or simulated annealing,

we suggest Numerical Recipes in C (Press et al., 1992) and Masters (1995) as

good sources of both information and code. Excel users can try out the built-in

Solver tool.

CHAPTER 4

Statistics

M any trading system developers have little familiarity with inferential statistics.

This is a rather perplexing state of affairs since statistics are essential to assessing

the behavior of trading systems. How, for example, can one judge whether an

apparent edge in the trades produced by a system is real or an artifact of sampling

or chance? Think of it-the next sample may not merely be another test, but an

actual trading exercise. If the system™s “edge” was due to chance, trading capital

could quickly be depleted. Consider optimization: Has the system been tweaked

into great profitability, or has the developer only succeeded in the nasty art of

curve-fitting? We have encountered many system developers who refuse to use

any optimization strategy whatsoever because of their irrational fear of curve-fit-

ting, not knowing that the right statistics can help detect such phenomena. In short,

inferential statistics can help a trader evaluate the likelihood that a system is cap-

turing a real inefficiency and will perform as profitably in the future as it has in

the past. In this book, we have presented the results of statistical analyses when-

ever doing so seemed useful and appropriate.

Among the kinds of inferential statistics that are most useful to traders are

t-tests, correlational statistics, and such nonparametric statistics as the runs test.

T-rests are useful for determining the probability that the mean or sum of any

series of independent values (derived from a sampling process) is greater or less

than some other such mean, is a fixed number, or falls within a certain band. For

example, t-tests can reveal the probability that the total profits from a series of

trades, each with its individual profitAoss figure, could be greater than some thresh-

old as a result of chance or sampling. These tests are also useful for evaluating san-

pies of returns, e.g., the daily or monthly returns of a portfolio over a period of

years. Finally, t-tests can help to set the boundaries of likely future performance

51

(assuming no structural change in the market), making possible such statements as

“the probability that the average profit will be between x and y in the future is

greater than 95%:™

Correlational stnristics help determine the degree of relationship between

different variables. When applied inferentially, they may also be used to assess

whether any relationships found are “statistically significant,” and not merely due

to chance. Such statistics aid in setting confidence intervals or boundaries on the

“true” (population) correlation, given the observed correlation for a specific sam-

ple. ,Correlational statistics are essential when searching for predictive variables to

include in a neural network or regression-based trading model.

Correlational statistics, as well as such nonparamenic statistics as the runs test,

are useful in assessing serial dependence or serial correlation. For instance, do prof-

itable trades come in streaks or runs that are then followed by periods of unprofitable

trading? The runs test can help determine whether this is actually occurring. If there

is serial dependence in a system, it is useful to know it because the system can then

be revised to make use of the serial dependence. For example, if a system has clear

ly defined streaks of winning and losing, a metasystem can be developed. The mem-

system would take every trade after a winning trade until the tirst losing trade comes

along, then stop trading until a winning trade is hit, at which point it would again

begin taking trades. If there really are runs, this strategy, or something similar, could

greatly improve a system™s behavior.

WHY USE STATISTICS TO EVALUATE TRADING

SYSTEMS?

It is very important to determine whether any observed profits are real (not art-

facts of testing), and what the likelihood is that the system producing them will

continue to yield profits in the future when it is used in actual trading. While out-

of-sample testing can provide some indication of whether a system will hold up on

new (future) data, statistical methods can provide additional information and esti-

mates of probability. Statistics can help determine whether a system™s perfor-

mance is due to chance alone or if the trading model has some real validity.

Statistical calculations can even be adjusted for a known degree of curve-fitting,

thereby providing estimates of whether a chance pattern, present in the data sam-

ple being used to develop the system, has been curve-fitted or whether a pattern

present in the population (and hence one that would probably be present in future

samples drawn from the market being examined) has been modeled.

It should be noted that statistics generally make certain theoretical assumptions

about the data samples and populations to which they may be appropriately applied.

These assumptions are often violated when dealing with trading models. Some vio-

lations have little practical effect and may be ignored, while others may be worked

around. By using additional statistics, the more serious violations can sometimes be

WAFTER 4 Statistics 53

detected, avoided, or compensated for; at the very least, they can be understood. In

short, we are fully aware of these violations and will discuss our acts of hubris and

their ramifications after a foundation for understanding the issues has been laid.

SAMPLING

Fundamental to statistics and, therefore, important to understand, is the act of

sampling, which is the extraction of a number of data points or trades (a sample)

from a larger, abstractly defined set of data points or trades (a population). The

central idea behind statistical analysis is the use of samples to make inferences

about the populations from which they are drawn. When dealing with trading

models, the populations will most often be defined as all raw data (past, present,

and future) for a given tradable (e.g., all 5-minute bars on all futures on the S&P

500), all trades (past, present, and future) taken by a specified system on a given

tradable, or all yearly, monthly, or even daily returns. All quarterly earnings

(past, present, and future) of IBM is another example of a population. A sample

could be the specific historical data used in developing or testing a system, the

simulated trades taken, or monthly returns generated by the system on that data.

When creating a trading system, the developer usually draws a sample of

data from the population being modeled. For example, to develop an S&P 500 sys-

tem based on the hypothesis “If yesterday™s close is greater than the close three

days ago, then the market will rise tomorrow,” the developer draws a sample of

end-of-day price data from the S&P 500 that extends back, e.g., 5 years. The hope

is that the data sample drawn from the S&P 500 is represenrative of that market,

i.e., will accurately reflect the actual, typical behavior of that market (the popula-

tion from which the sample was drawn), so that the system being developed will

perform as well in the future (on a previously unseen sample of population data)

as it did in the past (on the sample used as development data). To help determine

whether the system will hold up, developers sometimes test systems on one or

more out-of-sample periods, i.e., on additional samples of data that have not been

used to develop or optimize the trading model. In our example, the S&P 500 devel-

oper might use 5 years of data--e.g., 1991 through 1995-to develop and tweak

the system, and reserve the data from 1996 as the out-of-sample period on which

to test the system. Reserving one or more sets of out-of-sample data is strongly

recommended.

One problem with drawing data samples from financial populations arises

from the complex and variable nature of the markets: today™s market may not be

tomorrow™s Sometimes the variations are very noticeable and their causes are

easily discerned, e.g., when the S&P 500 changed in 1983 as a result of the intro-

duction of futures and options. In such instances, the change may be construed as

having created two distinct populations: the S&P 500 prior to 1983 and the S&P

500 after 1983. A sample drawn from the earlier period would almost certainly

not be representative of the population defined by the later period because it was

drawn from a different population! This is, of course, an extreme case. More

often, structural market variations are due to subtle influences that are sometimes

impossible to identify, especially before the fact. In some cases, the market may

still be fundamentally the same, but it may be going through different phases;

each sample drawn might inadvertently be taken from a different phase and be

representative of that phase alone, not of the market as a whole. How can it be

determined that the population from which a sample is drawn for the purpose of

system development is the same as the population on which the system will be

traded? Short of hopping into a time machine and sampling the future, there is no

reliable way to tell if tomorrow will be the day the market undergoes a system-

killing metamorphosis! Multiple out-of-sample tests, conducted over a long peri-

od of time, may provide some assurance that a system will hold up, since they

may show that the market has not changed substantially across several sampling

periods. Given a representative sample, statistics can help make accurate infer-

ences about the population from which the sample was drawn. Statistics cannot,

however, reveal whether tomorrow™s market will have changed in some funda-

mental manner.

OPTIMIZATION AND CURVE-FITTING

Another issue found in trading system development is optimization, i.e., improv-

ing the performance of a system by adjusting its parameters until the system per-

forms its best on what the developer hopes is a representative sample. When the

system fails to hold up in the future (or on out-of-sample data), the optimization

process is pejoratively called curve-fitting. However, there is good curve-fitting

and bad curve-fitting. Good curve-fitting is when a model can be fit to the entire

relevant population (or, at least, to a sufficiently large sample thereof), suggesting

that valid characteristics of the entire population have been captured in the model.

Bad curve-@zing occurs when the system only fits chance characteristics, those

that are not necessarily representative of the population from which the sample

was drawn.

Developers are correct to fear bad curve-fitting, i.e., the situation in which

parameter values are adapted to the particular sample on which the system was

optimized, not to the population as a whole. If the sample was small or was not

representative of the population from which it was drawn, it is likely that the sys-

tem will look good on that one sample but fail miserably on another, or worse, lose

money in real-time trading. However, as the sample gets larger, the chance of this

happening becomes smaller: Bad curve-fitting declines and good curve-fitting

increases. All the statistics discussed reflect this, even the ones that specifically

concern optimization. It is true that the more combinations of things optimized,

the greater the likelihood good performance may be obtained by chance alone.

However, if the statistical result was sufficiently good, or the sample on which it

was based large enough to reduce the probability that the outcome was due to

chance, the result might still be very real and significant, even if many parameters

were optimized.

Some have argued that size does not matter, i.e., that sample size and the

number of trades studied have little or nothing to do with the risk of overopti-

mization, and that a large sample does not mitigate curve-fitting. This is patently

untrue, both intuitively and mathematically. Anyone would have less confidence in

a system that took only three or four trades over a lo-year period than in one that

took over 1,000 reasonably profitable trades. Think of a linear regression model in

which a straight line is being fit to a number of points. If there are only two points,

it is easy to fit the line perfectly every time, regardless of where the points are

located. If there are three points, it is harder. If there is a scatterplot of points, it is

going to be harder still, unless those points reveal some real characteristic of the

population that involves a linear relationship.

The linear regression example demonstrates that bad curve-fitting does

become more difficult as the sample size gets larger. Consider two trading sys-

tems: One system had a profit per trade of $100, it took 2 trades, and the stan-

dard deviation was $100 per trade: the other system took 1,000 trades, with

similar means and standard deviations. When evaluated statistically, the system

with 1,000 trades will be a lot more “statistically significant” than the one with

the 2 trades.

In multiple linear regression models, as the number of regression parameters

(beta weights) being estimated is increased relative to the sample size, the amount

of curve-fitting increases and statistical significance lessens for the same

degree of model fit. In other words, the greater the degree of curve-fitting, the

harder it is to get statistical significance. The exception is if the improvement in fit

when adding regressors is sufficient to compensate for the loss in significance due

to the additional parameters being estimated. In fact, an estimate of shrinkage (the

degree to which the multiple correlation can be expected to shrink when computed

using out-of-sample data) can even be calculated given sample size and number of

regressors: Shrinkage increases with regressors and decreases with sample size. In

short, there is mathematical evidence that curve-fitting to chance characteristics of

a sample, with concomitant poor generalization, is more likely if the sample is

small relative to the number of parameters being fit by the model. In fact, as n (the

sample size) goes to infinity, the probability that the curve-fitting (achieved by

optimizing a set of parameters) is nonrepresentative of the population goes to zero.

The larger the number of parameters being optimized, the larger the sample

required. In the language of statistics, the parameters being estimated use up the

available “degrees of freedom.”

All this leads to the conclusion that the larger the sample, the more likely its

“curves” are representative of characteristics of the market as a whole. A small

sample almost certainly will be nonrepresentative of the market: It is unlikely that

its curves will reflect those of the entire market that persist over time. Any model

built using a small sample will be capitalizing purely on the chance of sampling.

Whether curve-fitting is “good” or “bad” depends on if it was done to chance or

to real market patterns, which, in turn, largely depends on the size and representa-

tiveness of the sample. Statistics are useful because they make it possible to take

curve-fitting into account when evaluating a system.

When dealing with neural networks, concerns about overtraining or general-

ization are tantamount to concerns about bad curve-fitting. If the sample is large

enough and representative, curve-fitting some real characteristic of the market is

more likely, which may be good because the model should fit the market. On the

other hand, if the sample is small, the model will almost certainly be fit to pecu-

liar characteristics of the sample and not to the behavior of the market generally.

In neural networks, the concern about whether the neural network will generalize

is the same as the concern about whether other kinds of systems will hold up in

the future. To a great extent, generalization depends on the size of the sample on

which the neural network is trained. The larger the sample, or the smaller the num-

ber of connection weights (parameters) being estimated, the more likely the net-

work will generalize. Again, this can be demonstrated mathematically by

examining simple cases.

As was the case with regression, au estimate of shrinkage (the opposite of

generalization) may be computed when developing neural networks. In a very real

sense, a neural network is actually a multiple regression, albeit, nonlinear, and the

correlation of a neural net™s output with the target may be construed as a multiple

correlation coefficient. The multiple correlation obtained between a net™s output

and the target may be corrected for shrinkage to obtain some idea of how the net

might perform on out-of-sample data. Such shrinkage-corrected multiple correla-

tions should routinely be computed as a means of determining whether a network

has merely curve-fit the data or has discovered something useful. The formula for

correcting a multiple correlation for shrinkage is as follows:

A FORTRAN-style expression was used for reasons of typsetting. In this for-

mula, SQRT represents the square root operator; N is the number of data points

or, in the case of neural networks, facts; P is the number of regression coefti-

cients or, in the case of neural networks, connection weights; R represents the

uncorrected multiple correlation; and RC is the multiple correlation corrected

for shrinkage. Although this formula is strictly applicable only to linear multi-

ple regression (for which it was originally developed), it works well with neur-

al networks and may be used to estimate how much performance was inflated on

the in-sample data due to curve-fitting. The formula expresses a relationship

between sample size, number of parameters, and deterioration of results. The

statistical correction embodied in the shrinkage formula is used in the chapter on

neural network entry models.

SAMPLE SIZE AND REPRESENTATIVENESS

Although, for statistical reasons, the system developer should seek the largest sam

ple possible, there is a trade-off between sample size and representativeness when

dealing with the financial markets. Larger samples mean samples that go farther

back in time, which is a problem because the market of years ago may be funda-

mentally different from the market of today-remember the S&P 500 in 1983?

This means that a larger sample may sometimes be a less representative sample,

or one that confounds several distinct populations of data! Therefore, keep in mind

that, although the goal is to have the largest sample possible, it is equally impor-

tant to try to make sure the period from which the sample is drawn is still repre-

sentative of the market being predicted.