<< . .

. 5
( : 30)

. . >>

lution. Genetic optimizers endeavor to harness some of that incredible prob-
lem-solving power through a crude simulation of the evolutionary process. In
terms of overall performance and the variety of problems that may be solved,
there is no general-purpose optimizer more powerful than a properly crafted
genetic one.
Genetic optimizers are Stochustic optimizers in the sense that they take
advantage of random chance in their operation. It may not seem believable that
tossing dice can be a great way to solve problems, but, done correctly, it can be!
In addition to randomness, genetic optimizers employ selection and recombina-
tion. The clever integration of random chance, selection, and recombination is
responsible for the genetic optimizer™s great power. A full discussion of genetic
algorithms, which are the basis for genetic optimizers, appears in Part II.
Genetic optimizers have many highly desirable characteristics. One such
characteristic is speed, especially when faced with combinatorial explosion. A
genetic optimizer can easily be many orders of magnitude faster than a brute force
optimizer when there are a multiplicity of rules, or parameters that have many pos-
sible values, to manipulate. This is because, like user-guided optimization, genetic
optimization can focus on important regions of solution space while mostly ignor-
ing blind alleys. In contrast to user-guided optimization, the benefit of a selective
search is achieved without the need for human intervention.
Genetic optimizers can swiftly solve complex problems, and they are also
more immune than other kinds of optimizers to the effects of local maxima in the

fitness surface or, equivalently, local minima in the cost surface. Analytic methods
are worst in that they almost always walk right to the top of the nearest hill or bot-
tom of the nearest valley, without regard to whether higher hills or lower valleys
exist elsewhere. In contrast, a good genetic optimizer often locates the globally
best solution-quite an impressive feat when accomplished for cantankerous fit-
ness surfaces, such as those associated with matrices of neural connection weights.
Another characteristic of genetic optimization is that it works well with fit-
ness surfaces marked by discontinuities, flat regions, and other troublesome irreg-
ularities. Genetic optimization shares this characteristic with brute force,
user-guided, annealing-based, and other nonanalytic optimization methods.
Solutions that maximize such items as net profit, return on investment, the Sharpe
Ratio, and others that define difficult, nonanalytic fitness landscapes can be found
using a genetic optimizer. Genetic optimizers shine with difficult fitness functions
that lie beyond the purview of analytic methods. This does not mean that they can-
not be used to solve problems having more tractable fitness surfaces: Perhaps
slower than the analytic methods, they have the virtue of being more resistant to
the traps set by local optima.
Overall, genetic optimizers are the optimizers of choice when there are many
parameters or rules to adapt, when a global solution is desired, or when arbitrarily
complex (and not necessarily differentiable or continuous) fitness or cost functions
must be handled. Although special-purpose optimizers can outperform genetic opti-
mizers on specific kinds of problems, for general-purpose optimization, genetic
optimizers are among the most powerful tools available.
What does a genetic optimizer look like in action? The dual moving-average
crossover system discussed earlier was translated to Cl 1 so that the genetic opti-
mizer in the C-Trader toolkit could be used to solve for the two system parame-
ters, LenA and LenB. LenA, the period of the first moving average, was examined
over the range of 2 through 50, as was LenB, the period of the second moving aver-
age. Optimization was for net profit so that the results would be directly compa-
rable with those produced earlier by brute force optimization. Below is the Cl 1
code for the crossover system:
CWR 3 Optimizers and Optimimion

I/ take no trades in lo&back period
if(clt[cbl c 910302) ( eqclsLcb1 = 0.0; continue; )

To solve for the best parameters, brute force optimization would require that 2,041
tests be performed; in TradeStation, that works out to about 56 minutes of com-
puting time, extrapolating from the earlier illustration in which a small subset of
the current solution space was examined. Only 1 minute of running time was
required by the genetic optimizer; in an attempt to put it at a significant disadvan-
tage, it was prematurely stopped after performing only 133 tests.
The output from the genetic optimizer appears in Table 3-2. In this table, PI rep-
resents the period of the faster moving average, P2 the period of the slower moving
average, NETthe total net profit, NETLNG the net profit for long positions, NETSiS
the net profit for short positions, PFAC the profit factor, ROA% the annualized rehm
on account, DRAW the maximum drawdown, TRDS the number of trades taken by the
system, WIN% the percentage of winning trades, AVGT the profit or loss resulting
from the average trade, and FZTthe fitness of the solution (which, in this instance, is
merely the total net p&it). As with the brute force data in Table 3-1, the genetic data
have been sorted by net profit (fitness) and only the 25 best solutions were presented.
Top 25 Solutions Found Using Genetic Optimization in C-Trader Toolkit

Comparison of the brute force and genetic optimization results (Tables 3- 1 and 3-2,
respectively) reveals that the genetic optimizer isolated a solution with a greater net
profit ($172,725) than did the brute force optimizer ($145,125). This is no surprise
since a larger solution space, not decimated by increments, was explored. The sur-
prise is that the better solution was found so quickly, despite the handicap of a pre-
maturely stopped evolutionary process. Results like these demonstrate the incredible
effectiveness of genetic optimization.

Optimization by Simulated Annealing
Optimizers based on annealing mimic the thermodynamic process by which liq-
uids freeze and metals anneal. Starting out at a high temperature, the atoms of a
liquid or molten metal bounce rapidly about in a random fashion. Slowly cooled,
they mange themselves into an orderly configuration-a crystal-that represents
a minimal energy state for the system. Simulated in software, this thermodynamic
process readily solves large-scale optimization problems.
As with genetic opimization, optimization by simulared annealing is a very
powerful Stochastic technique, modeled upon a natural phenomenon, that can find
globally optimal solutions and handle ill-behaved fitness functions. Simulated
annealing has effectively solved significant combinatorial problems, including
CHAPTER 3 Optimizers and Optimization 39

the famous “traveling salesman problem,” and the problem of how best to arrange the
millions of circuit elements found on modem integrated circuit chips, such as
those that power computers. Methods based on simulated annealing should not be
construed as limited to combinatorial optimization; they can readily be adapted to
the optimization of real-valued parameters. Consequently, optimizers based on
simulated annealing are applicable to a wide variety of problems, including those
faced by traders.
Since genetic optimizers perform so well, we have experienced little need to
explore optimizers based on simulated annealing. In addition, there have been a
few reports suggesting that, in many cases, annealing algorithms do not perform
as well as genetic algorithms. Because of these reasons, we have not provided
examples of simulated annealing and have little more to say about the method.

Analytic Optimizers
Analysis (as in “real analysis” or “complex analysis”) is an extension of classical
college calculus. Analytic optimizers involve the well-developed machinery of
analysis, specifically differential calculus and the study of analytic functions, in
the solution of practical problems. In some instances, analytic methods can yield
a direct (noniterative) solution to an optimization problem. This happens to be the
case for multiple regression, where solutions can be obtained with a few matrix
calculations. In multiple regression, the goal is to find a set of regression weights
that minimize the sum of the squared prediction errors. In other cases, iterative
techniques must be used. The connection weights in a neural network, for exam-
ple, cannot be directly determined. They must be estimated using an iterative pro-
cedure, such as back-propagation.
Many iterative techniques used to solve multivariate optimization prob-
lems (those involving several variables or parameters) employ some variation
on the theme of steepest ascent. In its most basic form, optimization by steep-
est ascent works as follows: A point in the domain of the fitness function (that
is, a set of parameter values) is chosen by some means. The gradient vector at
that point is evaluated by computing the derivatives of the fitness function with
respect to each of the variables or parameters; this defines the direction in n-
dimensional parameter space for which a fixed amount of movement will pro-
duce the greatest increase in fitness. A small step is taken up the hill in fitness
space, along the direction of the gradient. The gradient is then recomputed at
this new point, and another, perhaps smaller, step is taken. The process is
repeated until convergence occurs.
A real-world implementation of steepest ascent optimization has to specify
how the step size will be determined at each iteration, and how the direction
defined by the gradient will be adjusted for better overall convergence of the opti-
mization process. Naive implementations assume that there is an analytic fitness
surface (one that can be approximated locally by a convergent power series) hav-
ing hills that must be climbed. More sophisticated implementations go further,
commonly assuming that the fitness function can be well approximated locally by
a quadratic form. If a fitness function satisfies this assumption, then much faster
convergence to a solution can be achieved. However, when the fitness surface has
many irregularly shaped bills and valleys, quadratic forms often fail to provide a
good approximation. In such cases, the more sophisticated methods break down
entirely or their performance seriously degrades.
Worse than degraded performance is the problem of local solutions. Almost
all analytic methods, whether elementary or sophisticated, are easily trapped by
local maxima: they generally fail to locate the globally best solution when there
are™ many hills and valleys in the fitness surface. Least-squares, neural network
predictive modeling gives rise to fitness surfaces that, although clearly analytic,
are full of bumps, troughs, and other irregularities that lead standard analytic tech-
niques (including back-propagation, a variant on steepest ascent) astray. Local
maxima and other hazards that accompany such fitness surfaces can, however, be
sidestepped by cleverly marrying a genetic algorithm with an analytic one. For tit-
ness surfaces amenable to analytic optimization, such a combined algorithm can
provide the best of both worlds: fast, accurate solutions that are also likely to be
globally optimal.
Some fitness surfaces are simply not amenable to analytic optimization.
More specifically, analytic methods cannot be used when the fitness surface has
flat areas or discontinuities in the region of parameter space where a solution is to
be sought. Flat areas imply null gradients, hence the absence of a preferred direc-
tion in which to take a step. At points of discontinuity, the gradient is not defined;
again, a stepping direction cannot be determined. Even if a method does not
explicitly use gradient information, such information is employed implicitly by
the optimization algorithm. Unfortunately, many fitness functions of interest to
traders-including, for instance, all functions that involve net profit, drawdown,
percentage of winning trades, risk-to-reward ratios, and other like items-have
plateaus and discontinuities. They are, therefore, not tractable using analytic methods.
Although the discussion has centered on the maximization of fitness, every-
thing said applies as well to the minimization of cost. Any maximization technique
can be used for minimization, and vice versa: Multiply a fitness function by - 1 to
obtain an equivalent cost function; multiply a cost function by - 1 and a fitness func-
tion is the result. If a minimization algorithm takes your fancy, but a maximization
is required, use this trick to avoid having to recode the optimization algorithm.

Linear Programming
The techniques of linear programming are designed for optimization problems
involving linear cost or fitness functions, and linear constraints on the parameters
or input variables. Linear programming is typically used to solve resource allo-
cation problems. In the world of trading, one use of linear programming might be
to allocate capital among a set of investments to maximize net profit. If risk-
adjusted profit is to be optimized, linear programming methods cannot be used:
Risk-adjusted profit is not a linear function of the amount of capital allocated to
each of the investments; in such instances, other techniques (e.g., genetic algo-
rithms) must be employed. Linear programming methods are rarely useful in the
development of trading systems. They are mentioned here only to inform readers
of their existence.

Most traders do not seek failure, at least not consciously. However, knowledge of
the way failure is achieved can be of great benefit when seeking to avoid it. Failure
with an optimizer is easy to accomplish by following a few key rules. First, be sure
to use a small data sample when running sirindations: The smaller the sample, the
greater the likelihood it will poorly represent the data on which the trading model
will actually be traded. Next, make sure the trading system has a large number of
parameters and rules to optimize: For a given data sample, the greater the number
of variables that must be estimated, the easier it will be to obtain spurious results.
It would also be beneficial to employ only a single sample on which to run tests;
annoying out-of-sample data sets have no place in the rose-colored world of the
ardent loser. Finally, do avoid the headache of inferential statistics. Follow these
rules and failure is guaranteed.
What shape will failure take? Most likely, system performance will look
great in tests, but terrible in real-time trading. Neural network developers call this
phenomenon “poor generalization”; traders are acquainted with it through the
experience of margin calls and a serious loss of trading capital. One consequence
of such a failure-laden outcome is the formation of a popular misconception: that
all optimization is dangerous and to be feared.
In actual fact, optimizers are not dangerous and not all optimization should be
feared. Only bad optimization is dangerous and frightening. Optimization of large
parameter sets on small samples, without out-of-sample tests or inferential statis-
tics, is simply a bad practice that invites unhappy results for a variety of reasons.

Small Samples
Consider the impact of small samples on the optimization process. Small samples
of market data are unlikely to be representative of the universe from which they
are drawn: consequently, they will probably differ significantly from other sam-
ples obtained from the same universe. Applied to a small development sample, an
optimizer will faithfully discover the best possible solution. The best solution for

the development sample, however, may turn out to be a dreadful solution for the
later sample on which genuine trades will be taken. Failure ensues, not because
optimization has found a bad solution, but because it has found a good solution to
the wrong problem!
Optimization on inadequate samples is also good at spawning solutions that
represent only mathematical artifact. As the number of data points declines to the
number of free (adjustable) parameters, most models (trading, regression, or other-
wise) will attain a perfect tit to even random data. The principle involved is the
same one responsible for the fact that a line, which is a two-parameter model, can
always be drawn through any two distinct points, but cannot always be made to
intersect three arbitrary points. In statistics, this is known as the degrees-of-freedom
issue; there are as many degrees of freedom as there are data points beyond that
which can be fitted perfectly for purely mathematical reasons. Even when there are
enough data points to avoid a totally artifact-determined solution, some part of the
model fitness obtained through optimization will be of an artifact-determined
nature, a by-product of the process.
For multiple regression models, a formula is available that can be used to
estimate how much “shrinkage” would occur in the multiple correlation coeffi-
cient (a measure of model fitness) if the artifact-determined component were
removed. The shrinkage correction formula, which shows the relationship
between the number of parameters (regression coefficients) being optimized, sam-
ple size, and decreased levels of apparent fitness (correlation) in tests on new sam-
ples, is shown below in FORTRAN-style notation:

In this equation, N represents the number of data points, P the number of model
parameters, R the multiple correlation coefficient determined for the sample by the
regression (optimization) procedure, and RC the shrinkage-corrected multiple cor-
relation coefficient. The inverse formula, one that estimates the optimization-
inflated correlation (R) given the true correlation (RfJ existing in the population
from which the data were sampled, appears below:

These formulas, although legitimate only for linear regression, are not bad
for estimating how well a fully trained neural network model-which is nothing
more than a particular kind of nonhnezu regression-will generalize. When work-
ing with neural networks, let P represent the total number of connection weights
in the model. In addition, make sure that simple correlations are used when work-
ing with these formulas; if a neural network or regression package reports the
squared multiple correlation, take the square root.

Large Parameter Sets
An excessive number of free parameters or rules will impact an optimization effort
in a manner similar to an insufficient number of data points. As the number of ele-
ments undergoing optimization rises, a model™s ability to capitalize on idiosyn-
crasies in the development sample increases along with the proportion of the
model™s fitness that can be attributed to mathematical artifact. The result of opti-
mizing a large number of variables-whether rules, parameters, or both-will be
a model that performs well on the development data, but poorly on out-of-sample
test data and in actual trading.
It is not the absolute number of free parameters that should be of concern,
but the number of parameters relative to the number of data points. The shrinkage
formula discussed in the context of small samples is also heuristically relevant
here: It illustrates how the relationship between the number of data points and the
number of parameters affects the outcome. When there are too many parameters,
given the number of data points, mathematical artifacts and capitalization on
chance (curve-fitting, in the bad sense) become reasons for failure.

No Verification
One of the better ways to get into trouble is by failing to verify model performance
using out-of-sample tests or inferential statistics. Without such tests, the spurious
solutions resulting from small samples and large parameter sets, not to mention
other less obvious causes, will go undetected. The trading system that appears to
be ideal on the development sample will be put “on-line,” and devastating losses
will follow. Developing systems without subjecting them to out-of-sample and sta-
tistical tests is like flying blind, without a safety belt, in an uninspected aircraft.

Four steps can be taken to avoid failure and increase the odds of achieving suc-
cessful optimization. As a first step, optimize on the largest possible representative
sample and make sure many simulated trades are available for analysis. The sec-
ond step is to keep the number of free parameters or rules small, especially in rela-
tion to sample size. A third step involves running tests on out-of-sample data, that
is, data not used or even seen during the optimization process. As a fourth and final
step, it may be worthwhile to statistically assess the results.

Large, Representative Samples
As suggested earlier, failure is often a consequence of presenting an optimizer with
the wrong problem to solve. Conversely, success is likely when the optimizer is

presented with the right problem. The conclusion is that trading models should be
optimized on data from the near future, the data that will actually be traded; do that
and watch the profits roll in. The catch is where to find tomorrow™s data today.
Since the future has not yet happened, it is impossible to present the opti-
mizer with precisely the problem that needs to be solved. Consequently, it is nec-
essary to attempt the next-best alternative: to present the optimizer with a broader
problem, the solution to which should be as applicable as possible to the actual,
but impossible-to-solve, problem. One way to accomplish this is with a data sam-
ple that, even though not drawn from the future, embodies many characteristics
that might appear in future samples. Such a data sample should include bull and
bear markets, trending and nontrending periods, and even crashes. In addition, the
data in the sample should be as recent as possible so that it will reflect current pat-
terns of market behavior. This is what is meant by a representative sample.
As well as representative, the sample should be large. Large samples make it
harder for optimizers to uncover spurious or artifact-determined solutions.
Shrinkage, the expected decline in performance on unoptimized data, is reduced
when large samples are employed in the optimization process.
Sometimes, however, a trade-off must be made between the sample™s size
and the extent to which it is representative. As one goes farther back in history to
bolster a sample, the data may become less representative of current market con-
ditions. In some instances, there is a clear transition point beyond which the data
become much less representative: For example, the S&P 500 futures began trad-
ing in 1983, effecting a structural change in the general market. Trade-offs become
much less of an issue when working with intraday data on short time frames,
where tens of thousands or even hundreds of thousands of bars of data can be gath-
ered without going back beyond the recent past.
Finally, when running simulations and optimizations, pay attention to the
number of trades a system takes. Like large data samples, it is highly desirable that
simulations and tests involve numerous trades. Chance or artifact can easily be
responsible for any profits produced by a system that takes only a few trades,
regardless of the number of data points used in the test!

Few Rules and Parameters
To achieve success, limit the number of free rules and parameters, especially when
working with small data samples. For a given sample size, the fewer the rules or
parameters to optimize, the greater the likelihood that a trading system will main-
tain its performance in out-of-sample tests and real-time trading. Although sever-
al dozen parameters may be acceptable when working with several thousand
trades taken on 100,000 l-minute bars (about 1 year for the S&P 500 futures),
even two or three parameters may be excessive when developing a system using a
few years of end-of-day data. If a particular model requires many parameters, then
significant effort should be put into assembling a mammoth sample (the legendary
Gann supposedly went back over 1,000 years in his study of wheat prices). An
alternative that sometimes works is optimizing a trading model on a whole port-
folio, using the same rules and parameters across all markets-a technique used
extensively in this book.

Verification of Results
After optimizing the rules and parameters of a trading system to obtain good
behavior on the development or in-sample data, but before risking any real money,
it is essential to verify the system™s performance in some manner. Verification of
system performance is important because it gives the trader a chance to veto fai-
me and embrace success: Systems that fail the test of verification can be discard-
ed, ones that pass can be traded with confidence. Verification is the single most
critical step on the road to success with optimization or, in fact, with any other
method of discovering a trading model that really works.
To ensure success, verify any trading solution using out-of-sample tests or
inferential statistics, preferably both. Discard any solution that fails to be profitable
in an out-of-sample test: It is likely to fail again when the rubber hits the road.
Compute inferential statistics on all tests, both in-sample and out-of-sample. These
statistics reveal the probability that the performance observed in a sample reflects
something real that will hold up in other samples and in real-time trading. Inferential
statistics work by making probability inferences based on the distribution of prof-
itability in a system™s trades or returns. Be sure to use statistics that are corrected for
multiple tests when analyzing in-sample optimization results. Out-of-sample tests
should be analyzed with standard, uncorrected statistics. Such statistics appear in
some of the performance reports that are displayed in the chapter on simulators. The
use of statistics to evaluate trading systems is covered in depth in the following chap-
ter. Develop a working knowledge of statistics; it will make you a better trader.
Some suggest checking a model for sensitivity to small changes in parame-
ter values. A model highly tolerant of such changes is more “robust” than a model
not as tolerant, it is said. Do not pay too much attention to these claims. In truth,
parameter tolerance cannot be relied upon as a gauge of model robustness. Many
extremely robust models are highly sensitive to the values of certain parameters.
The only true arbiters of system robustness are statistical and, especially, out-of-
sample tests.

There are two major alternatives to traditional optimization: walk-forward opti-
mization and self-adaptive systems. Both of these techniques have the advantage

<< . .

. 5
( : 30)

. . >>