<< . .

. 6
( : 30)



. . >>

that any tests carried out are, from start to finish, effectively out-of-sample.
46




Examine the performance data, run some inferential statistics, plot the equity
curve, and the system is ready to be traded. Everything is clean and mathemati-
cally unimpeachable. Corrections for shrinkage or multiple tests, worries over
excessive curve-fitting, and many of the other concerns that plague traditional
optimization methodologies can be forgotten. Moreover, with today™s modem
computer technology, walk-forward and self-adaptive models are practical and not
even difficult to implement.
The principle behind walk-forward optimization (also known as walk-for-
ward testing) is to emulate the steps involved in actually trading a system that
requires periodic optimization. It works like this: Optimize the system on the data
points 1 through M. Then simulate trading on data points M + I through M + K.
Reoptimize the system on data points K + 1 through K + M. Then simulate trad-
ing on points (K + M) + 1 through (K + M) + K. Advance through the data series
in this fashion until no more data points are left to analyze. As should be evident,
the system is optimized on a sample of historical data and then traded. After some
period of time, the system is reoptimized and trading is resumed. The sequence of
events guarantees that the data on which trades take place is always in the future
relative to the optimization process; all trades occur on what is, essentially, out-of-
sample data. In walk-forward testing, M is the look-back or optimization window
and K the reoptimization interval.
Self-adaptive systems work in a similar manner, except that the optimization
or adaptive process is part of the system, rather than the test environment. As each
bar or data point comes along, a self-adaptive system updates its internal state (its
parameters or rules) and then makes decisions concerning actions required on the
next bar or data point. When the next bar arrives, the decided-upon actions are car-
ried out and the process repeats. Internal updates, which are how the system learns
about or adapts to the market, need not occur on every single bar. They can be per-
formed at fixed intervals or whenever deemed necessary by the model.
The trader planning to work with self-adapting systems will need a power-
ful, component-based development platform that employs a strong language, such
as Cf f, Object Pascal, or Visual Basic, and that provides good access to third-
party libraries and software components. Components are designed to be incorpo-
rated into user-written software, including the special-purpose software that
constitutes an adaptive system. The more components that are available, the less
work there is to do. At the very least, a trader venturing into self-adaptive systems
should have at hand genetic optimizer and trading simulator components that can
be easily embedded within a trading model. Adaptive systems will be demonstrat-
ed in later chapters, showing how this technique works in practice.
There is no doubt that walk-forward optimization and adaptive systems will
become more popular over time as the markets become more efficient and diffi-
cult to trade, and as commercial software packages become available that place
these techniques within reach of the average trader.
CHAPTER 3 optimizers and OQtimizatio” 47




OPTIMIZER TOOLS AND INFORMATION
Aerodynamics, electronics, chemistry, biochemistry, planning, and business are
just a few of the fields in which optimization plays a role. Because optimization is
of interest to so many problem-solving areas, research goes on everywhere, infor-
mation is abundant, and optimization tools proliferate. Where can this information
be found? What tools and products are available?
Brute force optimizers are usually buried in software packages aimed pri-
marily at tasks other than optimization; they are usually not available on their own.
In the world of trading, products like TradeStation and SuperCharts from Omega
Research (800-292.3453), Excalibur from Futures Truth (828-697-0273), and
MetaStock from Equis International (800-882-3040) have built-in brute force opti-
mizers. If you write your own software, brute force optimization is so trivial to
implement using in-line progranuning code that the use of special libraries or
components is superfluous. Products and code able to carry out brute force opti-
mization may also serve well for user-guided optimization.
Although sometimes appearing as built-in tools in specialized programs,
genetic optimizers are more often distributed in the form of class libraries or soft-
ware components, add-ons to various application packages, or stand-alone research
instruments. As an example of a class library written with the component paradigm
in mind, consider OptEvolve, the C+ + genetic optimizer from Scientific Consultant
Services (516-696-3333): This general-purpose genetic optimizer implements sev-
eral algorithms, including differential evolution, and is sold in the form of highly
portable C+ + code that can be used in UNIXiLINUX, DOS, and Windows envi-
ronments. TS-Evolve, available from Ruggiero Associates (800-21 l-9785) gives
users of TradeStation the ability to perform full-blown genetic optimizations. The
Evolver, which can be purchased from Palisade Corporation (800.432.7475) is a
general-purpose genetic optimizer for Microsoft™s Excel spreadsheet; it comes with
a dynamic link library (DLL) that can provide genetic optimization services to user
programs written in any language able to call DLL functions. GENESIS, a stand-
alone instrument aimed at the research community, was written by John Grefenstette
of the Naval Research Laboratory; the product is available in the form of generic C
source code. While genetic optimizers can occasionally be found in modeling tools
for chemists and in other specialized products, they do not yet form a native part of
popular software packages designed for traders.
Information about genetic optimization is readily available. Genetic algo-
rithms are discussed in many books, magazines, and journals and on Internet
newsgroups. A good overview of the field of genetic optimization can be found in
the Handbook of Generic Algorithms (Davis, 1991). Price and Storm (1997)
described an algorithm for “differential evolution,” which has been shown to be an
exceptionally powerful technique for optimization problems involving real-valued
parameters. Genetic algorithms are currently the focus of many academic journals
and conference proceedings. Lively discussions on all aspects of genetic opti-
mization take place in several Internet newsgroups of which compaigenetic is the
most noteworthy.
A basic exposition of simulated annealing can be found in Numericnl
Recipes in C (Press et al., 1992), as can C functions implementing optimizers for
both combinatorial and real-valued problems. Neural, Novel & Hybrid Algorithms
for Time Series Prediction (Masters, 1995) also discusses annealing-based opti-
mization and contains relevant C+ + code on the included CD-ROM. Like genet-
ic optimization, simulated annealing is the focus of many research studies,
conference presentations, journal articles, and Internet newsgroup discussions.
Algorithms and code for conjugate gradient and variable metric optimiza-
tion, two fairly sophisticated analytic methods, can be found in Numerical Recipes
in C (Press et al., 1992) and Numerical Recipes (Press et al., 1986). Masters (1995)
provides an assortment of analytic optimization procedures in C+ + (on the CD-
ROM that comes with his book), as well as a good discussion of the subject.
Additional procedures for analytic optimization are available in the IMSL and the
NAG library (from Visual Numerics, Inc., and Numerical Algorithms Group,
respectively) and in the optimization toolbox for MATLAB (a general-purpose
mathematical package from The MathWorks, 508-647-7000, that has gamed pop-
ularity in the financial engineering community). Finally, Microsoft™s Excel spread-
sheet contains a built-in analytic optimizer-the Solver-that employs conjugate
gradient or Newtonian methods.
As a source of general information about optimization applied to trading sys-
tem development, consult Design, Testing and Optimization qf Trading Systems by
Robert Pardo (1992). Among other things, this book shows the reader how to opti-
mize profitably, how to avoid undesirable curve-fitting, and how to carry out walk-
forward tests.


WHICH OPTIMIZER IS FOR YOU?
At the very least, you should have available an optimizer that is designed to make
both brute force and user-guided optimization easy to carry out. Such an optimiz-
er is already at hand if you use either TradeStation or Excalibur for system devel-
opment tasks. On the other hand, if you develop your systems in Excel, Visual
Basic, C+ +, or Delphi, you will have to create your own brute force optimizer.
As demonstrated earlier, a brute force optimizer is simple to implement. For many
problems, brute force or user-guided optimization is the best approach.
If your system development efforts require something beyond brute force, a
genetic optimizer is a great second choice. Armed with both brute force and genet-
ic optimizers, you will be able to solve virtually any problem imaginable. In our
own efforts, we hardly ever reach for any other kind of optimization tool!
TradeStation users will probably want TS-Evolve from Ruggiero Associates. The
Evolver product from Palisade Corporation is a good choice for Excel and Visual
Basic users. If you develop systems in C+ + or Delphi, select the C+ + Genetic
Optimizer from Scientific Consultant Services, Inc. A genetic optimizer is the
Swiss Army knife of the optimizer world: Even problems more efficiently solved
using such other techniques as analytic optimization will yield, albeit more slowly,
to a good genetic optimizer.
Finally, if you want to explore analytic optimization or simulated annealing,
we suggest Numerical Recipes in C (Press et al., 1992) and Masters (1995) as
good sources of both information and code. Excel users can try out the built-in
Solver tool.
CHAPTER 4


Statistics




M any trading system developers have little familiarity with inferential statistics.
This is a rather perplexing state of affairs since statistics are essential to assessing
the behavior of trading systems. How, for example, can one judge whether an
apparent edge in the trades produced by a system is real or an artifact of sampling
or chance? Think of it-the next sample may not merely be another test, but an
actual trading exercise. If the system™s “edge” was due to chance, trading capital
could quickly be depleted. Consider optimization: Has the system been tweaked
into great profitability, or has the developer only succeeded in the nasty art of
curve-fitting? We have encountered many system developers who refuse to use
any optimization strategy whatsoever because of their irrational fear of curve-fit-
ting, not knowing that the right statistics can help detect such phenomena. In short,
inferential statistics can help a trader evaluate the likelihood that a system is cap-
turing a real inefficiency and will perform as profitably in the future as it has in
the past. In this book, we have presented the results of statistical analyses when-
ever doing so seemed useful and appropriate.
Among the kinds of inferential statistics that are most useful to traders are
t-tests, correlational statistics, and such nonparametric statistics as the runs test.
T-rests are useful for determining the probability that the mean or sum of any
series of independent values (derived from a sampling process) is greater or less
than some other such mean, is a fixed number, or falls within a certain band. For
example, t-tests can reveal the probability that the total profits from a series of
trades, each with its individual profitAoss figure, could be greater than some thresh-
old as a result of chance or sampling. These tests are also useful for evaluating san-
pies of returns, e.g., the daily or monthly returns of a portfolio over a period of
years. Finally, t-tests can help to set the boundaries of likely future performance
51
(assuming no structural change in the market), making possible such statements as
“the probability that the average profit will be between x and y in the future is
greater than 95%:™
Correlational stnristics help determine the degree of relationship between
different variables. When applied inferentially, they may also be used to assess
whether any relationships found are “statistically significant,” and not merely due
to chance. Such statistics aid in setting confidence intervals or boundaries on the
“true” (population) correlation, given the observed correlation for a specific sam-
ple. ,Correlational statistics are essential when searching for predictive variables to
include in a neural network or regression-based trading model.
Correlational statistics, as well as such nonparamenic statistics as the runs test,
are useful in assessing serial dependence or serial correlation. For instance, do prof-
itable trades come in streaks or runs that are then followed by periods of unprofitable
trading? The runs test can help determine whether this is actually occurring. If there
is serial dependence in a system, it is useful to know it because the system can then
be revised to make use of the serial dependence. For example, if a system has clear
ly defined streaks of winning and losing, a metasystem can be developed. The mem-
system would take every trade after a winning trade until the tirst losing trade comes
along, then stop trading until a winning trade is hit, at which point it would again
begin taking trades. If there really are runs, this strategy, or something similar, could
greatly improve a system™s behavior.

WHY USE STATISTICS TO EVALUATE TRADING
SYSTEMS?
It is very important to determine whether any observed profits are real (not art-
facts of testing), and what the likelihood is that the system producing them will
continue to yield profits in the future when it is used in actual trading. While out-
of-sample testing can provide some indication of whether a system will hold up on
new (future) data, statistical methods can provide additional information and esti-
mates of probability. Statistics can help determine whether a system™s perfor-
mance is due to chance alone or if the trading model has some real validity.
Statistical calculations can even be adjusted for a known degree of curve-fitting,
thereby providing estimates of whether a chance pattern, present in the data sam-
ple being used to develop the system, has been curve-fitted or whether a pattern
present in the population (and hence one that would probably be present in future
samples drawn from the market being examined) has been modeled.
It should be noted that statistics generally make certain theoretical assumptions
about the data samples and populations to which they may be appropriately applied.
These assumptions are often violated when dealing with trading models. Some vio-
lations have little practical effect and may be ignored, while others may be worked
around. By using additional statistics, the more serious violations can sometimes be
WAFTER 4 Statistics 53




detected, avoided, or compensated for; at the very least, they can be understood. In
short, we are fully aware of these violations and will discuss our acts of hubris and
their ramifications after a foundation for understanding the issues has been laid.


SAMPLING
Fundamental to statistics and, therefore, important to understand, is the act of
sampling, which is the extraction of a number of data points or trades (a sample)
from a larger, abstractly defined set of data points or trades (a population). The
central idea behind statistical analysis is the use of samples to make inferences
about the populations from which they are drawn. When dealing with trading
models, the populations will most often be defined as all raw data (past, present,
and future) for a given tradable (e.g., all 5-minute bars on all futures on the S&P
500), all trades (past, present, and future) taken by a specified system on a given
tradable, or all yearly, monthly, or even daily returns. All quarterly earnings
(past, present, and future) of IBM is another example of a population. A sample
could be the specific historical data used in developing or testing a system, the
simulated trades taken, or monthly returns generated by the system on that data.
When creating a trading system, the developer usually draws a sample of
data from the population being modeled. For example, to develop an S&P 500 sys-
tem based on the hypothesis “If yesterday™s close is greater than the close three
days ago, then the market will rise tomorrow,” the developer draws a sample of
end-of-day price data from the S&P 500 that extends back, e.g., 5 years. The hope
is that the data sample drawn from the S&P 500 is represenrative of that market,
i.e., will accurately reflect the actual, typical behavior of that market (the popula-
tion from which the sample was drawn), so that the system being developed will
perform as well in the future (on a previously unseen sample of population data)
as it did in the past (on the sample used as development data). To help determine
whether the system will hold up, developers sometimes test systems on one or
more out-of-sample periods, i.e., on additional samples of data that have not been
used to develop or optimize the trading model. In our example, the S&P 500 devel-
oper might use 5 years of data--e.g., 1991 through 1995-to develop and tweak
the system, and reserve the data from 1996 as the out-of-sample period on which
to test the system. Reserving one or more sets of out-of-sample data is strongly
recommended.
One problem with drawing data samples from financial populations arises
from the complex and variable nature of the markets: today™s market may not be
tomorrow™s Sometimes the variations are very noticeable and their causes are
easily discerned, e.g., when the S&P 500 changed in 1983 as a result of the intro-
duction of futures and options. In such instances, the change may be construed as
having created two distinct populations: the S&P 500 prior to 1983 and the S&P
500 after 1983. A sample drawn from the earlier period would almost certainly
not be representative of the population defined by the later period because it was
drawn from a different population! This is, of course, an extreme case. More
often, structural market variations are due to subtle influences that are sometimes
impossible to identify, especially before the fact. In some cases, the market may
still be fundamentally the same, but it may be going through different phases;
each sample drawn might inadvertently be taken from a different phase and be
representative of that phase alone, not of the market as a whole. How can it be
determined that the population from which a sample is drawn for the purpose of
system development is the same as the population on which the system will be
traded? Short of hopping into a time machine and sampling the future, there is no
reliable way to tell if tomorrow will be the day the market undergoes a system-
killing metamorphosis! Multiple out-of-sample tests, conducted over a long peri-
od of time, may provide some assurance that a system will hold up, since they
may show that the market has not changed substantially across several sampling
periods. Given a representative sample, statistics can help make accurate infer-
ences about the population from which the sample was drawn. Statistics cannot,
however, reveal whether tomorrow™s market will have changed in some funda-
mental manner.


OPTIMIZATION AND CURVE-FITTING
Another issue found in trading system development is optimization, i.e., improv-
ing the performance of a system by adjusting its parameters until the system per-
forms its best on what the developer hopes is a representative sample. When the
system fails to hold up in the future (or on out-of-sample data), the optimization
process is pejoratively called curve-fitting. However, there is good curve-fitting
and bad curve-fitting. Good curve-fitting is when a model can be fit to the entire
relevant population (or, at least, to a sufficiently large sample thereof), suggesting
that valid characteristics of the entire population have been captured in the model.
Bad curve-@zing occurs when the system only fits chance characteristics, those
that are not necessarily representative of the population from which the sample
was drawn.
Developers are correct to fear bad curve-fitting, i.e., the situation in which
parameter values are adapted to the particular sample on which the system was
optimized, not to the population as a whole. If the sample was small or was not
representative of the population from which it was drawn, it is likely that the sys-
tem will look good on that one sample but fail miserably on another, or worse, lose
money in real-time trading. However, as the sample gets larger, the chance of this
happening becomes smaller: Bad curve-fitting declines and good curve-fitting
increases. All the statistics discussed reflect this, even the ones that specifically
concern optimization. It is true that the more combinations of things optimized,
the greater the likelihood good performance may be obtained by chance alone.
However, if the statistical result was sufficiently good, or the sample on which it
was based large enough to reduce the probability that the outcome was due to
chance, the result might still be very real and significant, even if many parameters
were optimized.
Some have argued that size does not matter, i.e., that sample size and the
number of trades studied have little or nothing to do with the risk of overopti-
mization, and that a large sample does not mitigate curve-fitting. This is patently
untrue, both intuitively and mathematically. Anyone would have less confidence in
a system that took only three or four trades over a lo-year period than in one that
took over 1,000 reasonably profitable trades. Think of a linear regression model in
which a straight line is being fit to a number of points. If there are only two points,
it is easy to fit the line perfectly every time, regardless of where the points are
located. If there are three points, it is harder. If there is a scatterplot of points, it is
going to be harder still, unless those points reveal some real characteristic of the
population that involves a linear relationship.
The linear regression example demonstrates that bad curve-fitting does
become more difficult as the sample size gets larger. Consider two trading sys-
tems: One system had a profit per trade of $100, it took 2 trades, and the stan-
dard deviation was $100 per trade: the other system took 1,000 trades, with
similar means and standard deviations. When evaluated statistically, the system
with 1,000 trades will be a lot more “statistically significant” than the one with
the 2 trades.
In multiple linear regression models, as the number of regression parameters
(beta weights) being estimated is increased relative to the sample size, the amount
of curve-fitting increases and statistical significance lessens for the same
degree of model fit. In other words, the greater the degree of curve-fitting, the
harder it is to get statistical significance. The exception is if the improvement in fit
when adding regressors is sufficient to compensate for the loss in significance due
to the additional parameters being estimated. In fact, an estimate of shrinkage (the
degree to which the multiple correlation can be expected to shrink when computed
using out-of-sample data) can even be calculated given sample size and number of
regressors: Shrinkage increases with regressors and decreases with sample size. In
short, there is mathematical evidence that curve-fitting to chance characteristics of
a sample, with concomitant poor generalization, is more likely if the sample is
small relative to the number of parameters being fit by the model. In fact, as n (the
sample size) goes to infinity, the probability that the curve-fitting (achieved by
optimizing a set of parameters) is nonrepresentative of the population goes to zero.
The larger the number of parameters being optimized, the larger the sample
required. In the language of statistics, the parameters being estimated use up the
available “degrees of freedom.”
All this leads to the conclusion that the larger the sample, the more likely its
“curves” are representative of characteristics of the market as a whole. A small
sample almost certainly will be nonrepresentative of the market: It is unlikely that
its curves will reflect those of the entire market that persist over time. Any model
built using a small sample will be capitalizing purely on the chance of sampling.
Whether curve-fitting is “good” or “bad” depends on if it was done to chance or
to real market patterns, which, in turn, largely depends on the size and representa-
tiveness of the sample. Statistics are useful because they make it possible to take
curve-fitting into account when evaluating a system.
When dealing with neural networks, concerns about overtraining or general-
ization are tantamount to concerns about bad curve-fitting. If the sample is large
enough and representative, curve-fitting some real characteristic of the market is
more likely, which may be good because the model should fit the market. On the
other hand, if the sample is small, the model will almost certainly be fit to pecu-
liar characteristics of the sample and not to the behavior of the market generally.
In neural networks, the concern about whether the neural network will generalize
is the same as the concern about whether other kinds of systems will hold up in
the future. To a great extent, generalization depends on the size of the sample on
which the neural network is trained. The larger the sample, or the smaller the num-
ber of connection weights (parameters) being estimated, the more likely the net-
work will generalize. Again, this can be demonstrated mathematically by
examining simple cases.
As was the case with regression, au estimate of shrinkage (the opposite of
generalization) may be computed when developing neural networks. In a very real
sense, a neural network is actually a multiple regression, albeit, nonlinear, and the
correlation of a neural net™s output with the target may be construed as a multiple
correlation coefficient. The multiple correlation obtained between a net™s output
and the target may be corrected for shrinkage to obtain some idea of how the net
might perform on out-of-sample data. Such shrinkage-corrected multiple correla-
tions should routinely be computed as a means of determining whether a network
has merely curve-fit the data or has discovered something useful. The formula for
correcting a multiple correlation for shrinkage is as follows:



A FORTRAN-style expression was used for reasons of typsetting. In this for-
mula, SQRT represents the square root operator; N is the number of data points
or, in the case of neural networks, facts; P is the number of regression coefti-
cients or, in the case of neural networks, connection weights; R represents the
uncorrected multiple correlation; and RC is the multiple correlation corrected
for shrinkage. Although this formula is strictly applicable only to linear multi-
ple regression (for which it was originally developed), it works well with neur-
al networks and may be used to estimate how much performance was inflated on
the in-sample data due to curve-fitting. The formula expresses a relationship
between sample size, number of parameters, and deterioration of results. The
statistical correction embodied in the shrinkage formula is used in the chapter on
neural network entry models.

SAMPLE SIZE AND REPRESENTATIVENESS
Although, for statistical reasons, the system developer should seek the largest sam
ple possible, there is a trade-off between sample size and representativeness when
dealing with the financial markets. Larger samples mean samples that go farther
back in time, which is a problem because the market of years ago may be funda-
mentally different from the market of today-remember the S&P 500 in 1983?
This means that a larger sample may sometimes be a less representative sample,
or one that confounds several distinct populations of data! Therefore, keep in mind
that, although the goal is to have the largest sample possible, it is equally impor-
tant to try to make sure the period from which the sample is drawn is still repre-
sentative of the market being predicted.

<< . .

. 6
( : 30)



. . >>