стр. 5 |

lem-solving power through a crude simulation of the evolutionary process. In

terms of overall performance and the variety of problems that may be solved,

there is no general-purpose optimizer more powerful than a properly crafted

genetic one.

Genetic optimizers are Stochustic optimizers in the sense that they take

advantage of random chance in their operation. It may not seem believable that

tossing dice can be a great way to solve problems, but, done correctly, it can be!

In addition to randomness, genetic optimizers employ selection and recombina-

tion. The clever integration of random chance, selection, and recombination is

responsible for the genetic optimizerвЂ™s great power. A full discussion of genetic

algorithms, which are the basis for genetic optimizers, appears in Part II.

Genetic optimizers have many highly desirable characteristics. One such

characteristic is speed, especially when faced with combinatorial explosion. A

genetic optimizer can easily be many orders of magnitude faster than a brute force

optimizer when there are a multiplicity of rules, or parameters that have many pos-

sible values, to manipulate. This is because, like user-guided optimization, genetic

optimization can focus on important regions of solution space while mostly ignor-

ing blind alleys. In contrast to user-guided optimization, the benefit of a selective

search is achieved without the need for human intervention.

Genetic optimizers can swiftly solve complex problems, and they are also

more immune than other kinds of optimizers to the effects of local maxima in the

36

fitness surface or, equivalently, local minima in the cost surface. Analytic methods

are worst in that they almost always walk right to the top of the nearest hill or bot-

tom of the nearest valley, without regard to whether higher hills or lower valleys

exist elsewhere. In contrast, a good genetic optimizer often locates the globally

best solution-quite an impressive feat when accomplished for cantankerous fit-

ness surfaces, such as those associated with matrices of neural connection weights.

Another characteristic of genetic optimization is that it works well with fit-

ness surfaces marked by discontinuities, flat regions, and other troublesome irreg-

ularities. Genetic optimization shares this characteristic with brute force,

user-guided, annealing-based, and other nonanalytic optimization methods.

Solutions that maximize such items as net profit, return on investment, the Sharpe

Ratio, and others that define difficult, nonanalytic fitness landscapes can be found

using a genetic optimizer. Genetic optimizers shine with difficult fitness functions

that lie beyond the purview of analytic methods. This does not mean that they can-

not be used to solve problems having more tractable fitness surfaces: Perhaps

slower than the analytic methods, they have the virtue of being more resistant to

the traps set by local optima.

Overall, genetic optimizers are the optimizers of choice when there are many

parameters or rules to adapt, when a global solution is desired, or when arbitrarily

complex (and not necessarily differentiable or continuous) fitness or cost functions

must be handled. Although special-purpose optimizers can outperform genetic opti-

mizers on specific kinds of problems, for general-purpose optimization, genetic

optimizers are among the most powerful tools available.

What does a genetic optimizer look like in action? The dual moving-average

crossover system discussed earlier was translated to Cl 1 so that the genetic opti-

mizer in the C-Trader toolkit could be used to solve for the two system parame-

ters, LenA and LenB. LenA, the period of the first moving average, was examined

over the range of 2 through 50, as was LenB, the period of the second moving aver-

age. Optimization was for net profit so that the results would be directly compa-

rable with those produced earlier by brute force optimization. Below is the Cl 1

code for the crossover system:

CWR 3 Optimizers and Optimimion

I/ take no trades in lo&back period

if(clt[cbl c 910302) ( eqclsLcb1 = 0.0; continue; )

To solve for the best parameters, brute force optimization would require that 2,041

tests be performed; in TradeStation, that works out to about 56 minutes of com-

puting time, extrapolating from the earlier illustration in which a small subset of

the current solution space was examined. Only 1 minute of running time was

required by the genetic optimizer; in an attempt to put it at a significant disadvan-

tage, it was prematurely stopped after performing only 133 tests.

The output from the genetic optimizer appears in Table 3-2. In this table, PI rep-

resents the period of the faster moving average, P2 the period of the slower moving

average, NETthe total net profit, NETLNG the net profit for long positions, NETSiS

the net profit for short positions, PFAC the profit factor, ROA% the annualized rehm

on account, DRAW the maximum drawdown, TRDS the number of trades taken by the

system, WIN% the percentage of winning trades, AVGT the profit or loss resulting

from the average trade, and FZTthe fitness of the solution (which, in this instance, is

merely the total net p&it). As with the brute force data in Table 3-1, the genetic data

have been sorted by net profit (fitness) and only the 25 best solutions were presented.

Top 25 Solutions Found Using Genetic Optimization in C-Trader Toolkit

Comparison of the brute force and genetic optimization results (Tables 3- 1 and 3-2,

respectively) reveals that the genetic optimizer isolated a solution with a greater net

profit ($172,725) than did the brute force optimizer ($145,125). This is no surprise

since a larger solution space, not decimated by increments, was explored. The sur-

prise is that the better solution was found so quickly, despite the handicap of a pre-

maturely stopped evolutionary process. Results like these demonstrate the incredible

effectiveness of genetic optimization.

Optimization by Simulated Annealing

Optimizers based on annealing mimic the thermodynamic process by which liq-

uids freeze and metals anneal. Starting out at a high temperature, the atoms of a

liquid or molten metal bounce rapidly about in a random fashion. Slowly cooled,

they mange themselves into an orderly configuration-a crystal-that represents

a minimal energy state for the system. Simulated in software, this thermodynamic

process readily solves large-scale optimization problems.

As with genetic opimization, optimization by simulared annealing is a very

powerful Stochastic technique, modeled upon a natural phenomenon, that can find

globally optimal solutions and handle ill-behaved fitness functions. Simulated

annealing has effectively solved significant combinatorial problems, including

CHAPTER 3 Optimizers and Optimization 39

the famous вЂњtraveling salesman problem,вЂќ and the problem of how best to arrange the

millions of circuit elements found on modem integrated circuit chips, such as

those that power computers. Methods based on simulated annealing should not be

construed as limited to combinatorial optimization; they can readily be adapted to

the optimization of real-valued parameters. Consequently, optimizers based on

simulated annealing are applicable to a wide variety of problems, including those

faced by traders.

Since genetic optimizers perform so well, we have experienced little need to

explore optimizers based on simulated annealing. In addition, there have been a

few reports suggesting that, in many cases, annealing algorithms do not perform

as well as genetic algorithms. Because of these reasons, we have not provided

examples of simulated annealing and have little more to say about the method.

Analytic Optimizers

Analysis (as in вЂњreal analysisвЂќ or вЂњcomplex analysisвЂќ) is an extension of classical

college calculus. Analytic optimizers involve the well-developed machinery of

analysis, specifically differential calculus and the study of analytic functions, in

the solution of practical problems. In some instances, analytic methods can yield

a direct (noniterative) solution to an optimization problem. This happens to be the

case for multiple regression, where solutions can be obtained with a few matrix

calculations. In multiple regression, the goal is to find a set of regression weights

that minimize the sum of the squared prediction errors. In other cases, iterative

techniques must be used. The connection weights in a neural network, for exam-

ple, cannot be directly determined. They must be estimated using an iterative pro-

cedure, such as back-propagation.

Many iterative techniques used to solve multivariate optimization prob-

lems (those involving several variables or parameters) employ some variation

on the theme of steepest ascent. In its most basic form, optimization by steep-

est ascent works as follows: A point in the domain of the fitness function (that

is, a set of parameter values) is chosen by some means. The gradient vector at

that point is evaluated by computing the derivatives of the fitness function with

respect to each of the variables or parameters; this defines the direction in n-

dimensional parameter space for which a fixed amount of movement will pro-

duce the greatest increase in fitness. A small step is taken up the hill in fitness

space, along the direction of the gradient. The gradient is then recomputed at

this new point, and another, perhaps smaller, step is taken. The process is

repeated until convergence occurs.

A real-world implementation of steepest ascent optimization has to specify

how the step size will be determined at each iteration, and how the direction

defined by the gradient will be adjusted for better overall convergence of the opti-

mization process. Naive implementations assume that there is an analytic fitness

surface (one that can be approximated locally by a convergent power series) hav-

ing hills that must be climbed. More sophisticated implementations go further,

commonly assuming that the fitness function can be well approximated locally by

a quadratic form. If a fitness function satisfies this assumption, then much faster

convergence to a solution can be achieved. However, when the fitness surface has

many irregularly shaped bills and valleys, quadratic forms often fail to provide a

good approximation. In such cases, the more sophisticated methods break down

entirely or their performance seriously degrades.

Worse than degraded performance is the problem of local solutions. Almost

all analytic methods, whether elementary or sophisticated, are easily trapped by

local maxima: they generally fail to locate the globally best solution when there

areвЂ™ many hills and valleys in the fitness surface. Least-squares, neural network

predictive modeling gives rise to fitness surfaces that, although clearly analytic,

are full of bumps, troughs, and other irregularities that lead standard analytic tech-

niques (including back-propagation, a variant on steepest ascent) astray. Local

maxima and other hazards that accompany such fitness surfaces can, however, be

sidestepped by cleverly marrying a genetic algorithm with an analytic one. For tit-

ness surfaces amenable to analytic optimization, such a combined algorithm can

provide the best of both worlds: fast, accurate solutions that are also likely to be

globally optimal.

Some fitness surfaces are simply not amenable to analytic optimization.

More specifically, analytic methods cannot be used when the fitness surface has

flat areas or discontinuities in the region of parameter space where a solution is to

be sought. Flat areas imply null gradients, hence the absence of a preferred direc-

tion in which to take a step. At points of discontinuity, the gradient is not defined;

again, a stepping direction cannot be determined. Even if a method does not

explicitly use gradient information, such information is employed implicitly by

the optimization algorithm. Unfortunately, many fitness functions of interest to

traders-including, for instance, all functions that involve net profit, drawdown,

percentage of winning trades, risk-to-reward ratios, and other like items-have

plateaus and discontinuities. They are, therefore, not tractable using analytic methods.

Although the discussion has centered on the maximization of fitness, every-

thing said applies as well to the minimization of cost. Any maximization technique

can be used for minimization, and vice versa: Multiply a fitness function by - 1 to

obtain an equivalent cost function; multiply a cost function by - 1 and a fitness func-

tion is the result. If a minimization algorithm takes your fancy, but a maximization

is required, use this trick to avoid having to recode the optimization algorithm.

Linear Programming

The techniques of linear programming are designed for optimization problems

involving linear cost or fitness functions, and linear constraints on the parameters

or input variables. Linear programming is typically used to solve resource allo-

cation problems. In the world of trading, one use of linear programming might be

to allocate capital among a set of investments to maximize net profit. If risk-

adjusted profit is to be optimized, linear programming methods cannot be used:

Risk-adjusted profit is not a linear function of the amount of capital allocated to

each of the investments; in such instances, other techniques (e.g., genetic algo-

rithms) must be employed. Linear programming methods are rarely useful in the

development of trading systems. They are mentioned here only to inform readers

of their existence.

HOW TO FAIL WITH OPTIMIZATION

Most traders do not seek failure, at least not consciously. However, knowledge of

the way failure is achieved can be of great benefit when seeking to avoid it. Failure

with an optimizer is easy to accomplish by following a few key rules. First, be sure

to use a small data sample when running sirindations: The smaller the sample, the

greater the likelihood it will poorly represent the data on which the trading model

will actually be traded. Next, make sure the trading system has a large number of

parameters and rules to optimize: For a given data sample, the greater the number

of variables that must be estimated, the easier it will be to obtain spurious results.

It would also be beneficial to employ only a single sample on which to run tests;

annoying out-of-sample data sets have no place in the rose-colored world of the

ardent loser. Finally, do avoid the headache of inferential statistics. Follow these

rules and failure is guaranteed.

What shape will failure take? Most likely, system performance will look

great in tests, but terrible in real-time trading. Neural network developers call this

phenomenon вЂњpoor generalizationвЂќ; traders are acquainted with it through the

experience of margin calls and a serious loss of trading capital. One consequence

of such a failure-laden outcome is the formation of a popular misconception: that

all optimization is dangerous and to be feared.

In actual fact, optimizers are not dangerous and not all optimization should be

feared. Only bad optimization is dangerous and frightening. Optimization of large

parameter sets on small samples, without out-of-sample tests or inferential statis-

tics, is simply a bad practice that invites unhappy results for a variety of reasons.

Small Samples

Consider the impact of small samples on the optimization process. Small samples

of market data are unlikely to be representative of the universe from which they

are drawn: consequently, they will probably differ significantly from other sam-

ples obtained from the same universe. Applied to a small development sample, an

optimizer will faithfully discover the best possible solution. The best solution for

42

the development sample, however, may turn out to be a dreadful solution for the

later sample on which genuine trades will be taken. Failure ensues, not because

optimization has found a bad solution, but because it has found a good solution to

the wrong problem!

Optimization on inadequate samples is also good at spawning solutions that

represent only mathematical artifact. As the number of data points declines to the

number of free (adjustable) parameters, most models (trading, regression, or other-

wise) will attain a perfect tit to even random data. The principle involved is the

same one responsible for the fact that a line, which is a two-parameter model, can

always be drawn through any two distinct points, but cannot always be made to

intersect three arbitrary points. In statistics, this is known as the degrees-of-freedom

issue; there are as many degrees of freedom as there are data points beyond that

which can be fitted perfectly for purely mathematical reasons. Even when there are

enough data points to avoid a totally artifact-determined solution, some part of the

model fitness obtained through optimization will be of an artifact-determined

nature, a by-product of the process.

For multiple regression models, a formula is available that can be used to

estimate how much вЂњshrinkageвЂќ would occur in the multiple correlation coeffi-

cient (a measure of model fitness) if the artifact-determined component were

removed. The shrinkage correction formula, which shows the relationship

between the number of parameters (regression coefficients) being optimized, sam-

ple size, and decreased levels of apparent fitness (correlation) in tests on new sam-

ples, is shown below in FORTRAN-style notation:

In this equation, N represents the number of data points, P the number of model

parameters, R the multiple correlation coefficient determined for the sample by the

regression (optimization) procedure, and RC the shrinkage-corrected multiple cor-

relation coefficient. The inverse formula, one that estimates the optimization-

inflated correlation (R) given the true correlation (RfJ existing in the population

from which the data were sampled, appears below:

These formulas, although legitimate only for linear regression, are not bad

for estimating how well a fully trained neural network model-which is nothing

more than a particular kind of nonhnezu regression-will generalize. When work-

ing with neural networks, let P represent the total number of connection weights

in the model. In addition, make sure that simple correlations are used when work-

ing with these formulas; if a neural network or regression package reports the

squared multiple correlation, take the square root.

43

Large Parameter Sets

An excessive number of free parameters or rules will impact an optimization effort

in a manner similar to an insufficient number of data points. As the number of ele-

ments undergoing optimization rises, a modelвЂ™s ability to capitalize on idiosyn-

crasies in the development sample increases along with the proportion of the

modelвЂ™s fitness that can be attributed to mathematical artifact. The result of opti-

mizing a large number of variables-whether rules, parameters, or both-will be

a model that performs well on the development data, but poorly on out-of-sample

test data and in actual trading.

It is not the absolute number of free parameters that should be of concern,

but the number of parameters relative to the number of data points. The shrinkage

formula discussed in the context of small samples is also heuristically relevant

here: It illustrates how the relationship between the number of data points and the

number of parameters affects the outcome. When there are too many parameters,

given the number of data points, mathematical artifacts and capitalization on

chance (curve-fitting, in the bad sense) become reasons for failure.

No Verification

One of the better ways to get into trouble is by failing to verify model performance

using out-of-sample tests or inferential statistics. Without such tests, the spurious

solutions resulting from small samples and large parameter sets, not to mention

other less obvious causes, will go undetected. The trading system that appears to

be ideal on the development sample will be put вЂњon-line,вЂќ and devastating losses

will follow. Developing systems without subjecting them to out-of-sample and sta-

tistical tests is like flying blind, without a safety belt, in an uninspected aircraft.

HOW TO SUCCEED WITH OPTIMIZATION

Four steps can be taken to avoid failure and increase the odds of achieving suc-

cessful optimization. As a first step, optimize on the largest possible representative

sample and make sure many simulated trades are available for analysis. The sec-

ond step is to keep the number of free parameters or rules small, especially in rela-

tion to sample size. A third step involves running tests on out-of-sample data, that

is, data not used or even seen during the optimization process. As a fourth and final

step, it may be worthwhile to statistically assess the results.

Large, Representative Samples

As suggested earlier, failure is often a consequence of presenting an optimizer with

the wrong problem to solve. Conversely, success is likely when the optimizer is

44

presented with the right problem. The conclusion is that trading models should be

optimized on data from the near future, the data that will actually be traded; do that

and watch the profits roll in. The catch is where to find tomorrowвЂ™s data today.

Since the future has not yet happened, it is impossible to present the opti-

mizer with precisely the problem that needs to be solved. Consequently, it is nec-

essary to attempt the next-best alternative: to present the optimizer with a broader

problem, the solution to which should be as applicable as possible to the actual,

but impossible-to-solve, problem. One way to accomplish this is with a data sam-

ple that, even though not drawn from the future, embodies many characteristics

that might appear in future samples. Such a data sample should include bull and

bear markets, trending and nontrending periods, and even crashes. In addition, the

data in the sample should be as recent as possible so that it will reflect current pat-

terns of market behavior. This is what is meant by a representative sample.

As well as representative, the sample should be large. Large samples make it

harder for optimizers to uncover spurious or artifact-determined solutions.

Shrinkage, the expected decline in performance on unoptimized data, is reduced

when large samples are employed in the optimization process.

Sometimes, however, a trade-off must be made between the sampleвЂ™s size

and the extent to which it is representative. As one goes farther back in history to

bolster a sample, the data may become less representative of current market con-

ditions. In some instances, there is a clear transition point beyond which the data

become much less representative: For example, the S&P 500 futures began trad-

ing in 1983, effecting a structural change in the general market. Trade-offs become

much less of an issue when working with intraday data on short time frames,

where tens of thousands or even hundreds of thousands of bars of data can be gath-

ered without going back beyond the recent past.

Finally, when running simulations and optimizations, pay attention to the

number of trades a system takes. Like large data samples, it is highly desirable that

simulations and tests involve numerous trades. Chance or artifact can easily be

responsible for any profits produced by a system that takes only a few trades,

regardless of the number of data points used in the test!

Few Rules and Parameters

To achieve success, limit the number of free rules and parameters, especially when

working with small data samples. For a given sample size, the fewer the rules or

parameters to optimize, the greater the likelihood that a trading system will main-

tain its performance in out-of-sample tests and real-time trading. Although sever-

al dozen parameters may be acceptable when working with several thousand

trades taken on 100,000 l-minute bars (about 1 year for the S&P 500 futures),

even two or three parameters may be excessive when developing a system using a

few years of end-of-day data. If a particular model requires many parameters, then

significant effort should be put into assembling a mammoth sample (the legendary

Gann supposedly went back over 1,000 years in his study of wheat prices). An

alternative that sometimes works is optimizing a trading model on a whole port-

folio, using the same rules and parameters across all markets-a technique used

extensively in this book.

Verification of Results

After optimizing the rules and parameters of a trading system to obtain good

behavior on the development or in-sample data, but before risking any real money,

it is essential to verify the systemвЂ™s performance in some manner. Verification of

system performance is important because it gives the trader a chance to veto fai-

me and embrace success: Systems that fail the test of verification can be discard-

ed, ones that pass can be traded with confidence. Verification is the single most

critical step on the road to success with optimization or, in fact, with any other

method of discovering a trading model that really works.

To ensure success, verify any trading solution using out-of-sample tests or

inferential statistics, preferably both. Discard any solution that fails to be profitable

in an out-of-sample test: It is likely to fail again when the rubber hits the road.

Compute inferential statistics on all tests, both in-sample and out-of-sample. These

statistics reveal the probability that the performance observed in a sample reflects

something real that will hold up in other samples and in real-time trading. Inferential

statistics work by making probability inferences based on the distribution of prof-

itability in a systemвЂ™s trades or returns. Be sure to use statistics that are corrected for

multiple tests when analyzing in-sample optimization results. Out-of-sample tests

should be analyzed with standard, uncorrected statistics. Such statistics appear in

some of the performance reports that are displayed in the chapter on simulators. The

use of statistics to evaluate trading systems is covered in depth in the following chap-

ter. Develop a working knowledge of statistics; it will make you a better trader.

Some suggest checking a model for sensitivity to small changes in parame-

ter values. A model highly tolerant of such changes is more вЂњrobustвЂќ than a model

not as tolerant, it is said. Do not pay too much attention to these claims. In truth,

parameter tolerance cannot be relied upon as a gauge of model robustness. Many

extremely robust models are highly sensitive to the values of certain parameters.

The only true arbiters of system robustness are statistical and, especially, out-of-

sample tests.

ALTERNATIVES TO TRADITIONAL OPTIMIZATION

There are two major alternatives to traditional optimization: walk-forward opti-

mization and self-adaptive systems. Both of these techniques have the advantage

стр. 5 |