$1,000 per trade. (For information about the various markets and their symbols,

see Table 11-l in the “Introduction” to Part II.)

Only the IO-Year Notes and Cotton showed strong profits across all three

entry orders in-sample. Out-of-sample, performance on these markets was miser-

able. The S&P 500, a market that, in our experience, has many clear and tradable

cycles, demonstrated strong profitability on the in-sample data when entry was at

open or on limit. This market was strongly profitable out-of-sample with entry on

limit and on stop, but somewhat less profitable with entry at open. Interestingly,

the NYFE, although evidencing strong in-sample profits with entry at open and on

limit, had losses out-of-sample across all three orders. There are a few other prof-

CHAPTER 10 Cycle-Based Entries 223

TAELE 10-2

Performance Data Broken Down by Market and Test

, __. _. --ample 1

.. -...

r.-

” In2 03 1 count I01 102 103 1 count

I++ I+4 I- I 91 I I& I** I&+ I 7

. . I I-

91 -_ _ _ I n

++ ++ -

224

itable in-sample market-order combinations, as well as out-of-sample market-

order combinations, However, very little correspondence between the two was

observed. Perhaps markets that have not had cycles in the past (in-sample) have

cycles in the present (out-of-sample), and vice versa. At least the S&P 500

behaved as expected on the basis of prior research and may be one of the few mar-

kets consistently amenable to cycle trading in this crude form.

Figure 10-4 depicts the equity for the portfolio with entry at open. Equity

declined slowly and then became rather flat until about August 1992, at which

time it began a steady and rapid decline.

CONCLUSION

In our May 1997 study, the filter bank method appeared to have potential as the

basis for an effective trading strategy. At times it worked incredibly well, and was

almost completely insensitive to large variations in its parameters, whereas at

other times it performed poorly. The results may simply have been due to the fact

that the implementation was “quick and dirty.” Back then, the focus was on the

S&P 500, a market that continued to trade well in the present study.

The results of the current study are disappointing, all the more given the the-

oretical elegance of the filters. It may be that other approaches to the analysis of

cycles, e.g., the use of maximum entropy, might have provided better perfor-

mance; then again, maybe not. Other traders have also experienced similar disap-

pointments using a variety of techniques when trading cycles in a simple,

buy-the-bottom/sell-the-top manner. It may be that cycles are too obvious and

detectable by any of a number of methods, and may be traded away very quickly

whenever they develop in the market. This especially seems the case in recent

years with the proliferation of cycle analysis software. The suggestion is not that

cycles should be abandoned as a concept, but that a more sophisticated use of

detected cycles must be made. Perhaps better results would ensue if cycles were

combined with other kinds of entry criteria, e.g., taking trades only if a cycle top

corresponds to an expected seasonal turning-point top.

Further studies are needed to determine whether the cycle model does indeed

have the characteristic of giving precise entries when it works, but failing miser-

ably when it does not work. Looking over a chart of the S&P 500 suggests this is

the case. There are frequentiy strings of four or five trades in a row, with entries

that occur precisely at market tops and bottoms, as if predicted with perfect hind-

sight. At other times, entries occur exactly where they should not. With a system

that behaves this way, our experience indicates that, combined with a proper exit,

sometimes great profits can be achieved. More specifically, losses have to be cut

very quickly when the model fails, but trades should not be prematurely terminat-

ed when the model is correct in its predictions. Because of the precision of the

model when the predictions are correct, an extremely tight stop could perhaps

FIGURE 1 O-4

Portfolio Equity Growth for Count&rend Cycle Trading

In-samola eauilv

226

accomplish the goal. When an exact cycle top or bottom is caught, the market

begins to move immediately in the favored direction, with hardly any adverse

excursion, and the stop is never hit. When the model fails, the stop is hit very

quickly, resulting in only a small loss. Given the fairly loose stop of the standard

exit, the benefits of sophisticated cycle trading may not have been realized.

WHAT HAVE WE LEARNED?

Models that are theoretically sound, elegant, and appealing do not neces-

n

sarily work well when trading real markets.

n Exception to Rule 1: The S&P 500 may respond to such methods; it did

so both in our earlier study and in the current one.

. When the model does work, it does so remarkably well. As stated earlier,

when examining its behavior on the S&P 500 and several other markets,

one can quickly and easily find strings of signals that pick off tops and

bottoms with the precision of hindsight.

. The previous point suggests that exits specifically designed for a system

that yields high precision when correct, but fails badly when incorrect,

may he required.

. The markets appear to have become more efficient relative to cycle

models, as they have to breakout models. Obvious market behavior

(such as clear, tradable cycles) are traded away before most traders can

capitalize on them. The lesson: Anything too theoretically appealing or

obvious will tend not to work.

Neural Networks

Neural network technology, a form of artificial intelligence (or AI), arose from

endeavors to emulate the kind of information processing and decision making

that occurs in living organisms. The goal was to model the behavior of neural tis-

sue in living systems by using a computer to implement structures composed of

simulated neurons and neural interconnections (synapses). Research on neural

networks began in the 1940s on a theoretical level. When computer technology

became sophisticated enough to accommodate such research, the study of neural

networks and their applications began in earnest. It was not, however, until the

mid-to-late 1980s that neural network technology became of interest to the finan-

cial community. By 1989, a few vendors of neural network development tools

were available, and there was one commercial S&P 500 forecasting system based

on this technology (Scientific Consultant Services™ NexTurn). In the early 199Os,

interest peaked, more development tools appeared, but the fervor then waned for

reasons discussed later.

While it is not within the scope of this book to present a full tutorial on neur-

al network technology, below is a brief discussion to provide basic understanding.

Those interested in exploring this subject in greater depth should read our contr-

butions to the books Virtual Trading (Lederman and Klein, 1995) and

Computerized Trading (Jurik, 1999), in which we also present detailed informa-

tion on system development using neural networks, as well as our articles in

Technical Analysis of Stocks and Commodities (Katz, April 1992; Katz and

McCormick, November 1996, November 1997). Neural Networks in Finance and

Znvesting (Trippi and Turban, 1993) should also be of interest.

WHAT ARE NEURAL NETWORKS?

Neural nerworks (or “nets”) are basically building blocks that learn and are useful

for pattern recognition, classification, and prediction. They hold special appeal to

traders because nets are capable of coping both with probability estimates in

uncertain situations and with “fuzzy” patterns, i.e., those recognizable by eye but

difficult to define in software using precise rules; and they have the potential to

recognize almost any pattern that exists. Nets can also integrate large amounts of

information without becoming stifled by detail and can be made to adapt to chang-

ing markets and market conditions.

A variety of neural networks are available, differing in terms of their “archi-

tecture,” i.e., the ways in which the simulated neurons are interconnected, the

details of how these neurons behave (signal processing behavior or “transfer func-

tions”), and the process through which learning takes place. There are a number of

popular kinds of neural networks that are of some use to traders: the Kohonen and

the Learning Vector Quantization (LVQ) networks, various adaptive resonance net-

works, and recurrent networks. In this chapter, the most popular and, in many

respects, the most useful kind of network is discussed: the “feed-forward” network.

As mentioned above, nets differ in the ways they learn. The system develop-

er plays the role of the neural network™s teacher, providing the net with examples

to learn from. Some nets employ “supervised learning” and others “unsupervised

learning.” Supervised learning occurs when the network is taught to produce a cor-

rect solution by being shown instances of correct solutions. This is a form of

paired-associate learning: The network is presented with pairs of inputs and a

desired output; for every set of inputs, it is the task of the net to learn to produce

the desired output. Unsupervised learning, on the other hand, involves nets that

take the sets of inputs they are given and organize them as they see tit, according

to patterns they lind therein. Regardless of the form of learning employed, the

main difficulty in developing successful neural network models is in finding and

“massaging” historical data into training examples or “facts” that highlight rele-

vant patterns so that the nets can learn efficiently and not be put astray or con-

fused; “preprocessing” the data is an art in itself.

The actual process of learning usually involves some mechanism for updat-

ing the neural connection weights in response to the training examples. With feed-

forward architectures, back-propagation, a form of steepest-descent optimization,

is often used. Genetic algorithms are also effective. These are very computation-

ally intensive and time-consuming, but generally produce better final results.

Feed-Forward Neural Networks

A feed-forward network consists of layers of neurons. The input layer, the tirst

layer, receives data or inputs from the outside world. The inputs consist of inde-

pendent variables (e.g., market or indicator variables upon which the system is to

be based) from which some inference is to be drawn or a prediction is to be made.

The input layer is massively connected to tire next layer, which is often called the

hidden layer because it has no connections to tire outside world. The outputs of the

hidden layer are fed to the next layer, which may be another hidden layer (if it is,

the process repeats), or it may be the output layer. Each neuron in the output layer

produces an output composed of the predictions, classifications, or decisions made

by the network. Networks are usually identified by the number of neurons in each

layer: For example, a 10-3-l network is one that has 10 neurons in its first or input

layer, 3 neurons in its middle layer, and 1 neuron in its output layer. Networks vary

in size, from only a few neurons to thousands, from only three layers to dozens;

the size depends on the complexity of the problem. Almost always, a three- or

four-layer network suffices.

Feed-forward networks (the kind being used in this chapter) implement a par-

ticular form of nonlinear multiple regression. The net takes a number of input vari-

ables and uses them to predict a target, exactly as in regression. In a standard linear

multiple regression, if the goal is to predict cholesterol (the dependent variable or

target) on the basis of dietary fat intake and exercise (the independent variables or

inputs), the data would be modeled as follows: predicted cholesterol = a + b * fat

intake + c * exercise: where a, b, and c represent parameters that would be deter-

mined by a statistical procedure. In a least-squares sense, a line, plane, or hyper-

plane (depending on the number of independent variables) is being fitted to the

points in a data space. In the example above, a plane is being fit: The x-axis repre-

sents fat intake, tire y-axis is exercise, and the height of the plane at each xy coor-

dinate pair represents predicted cholesterol.

When using neural network technology, the two-dimensional plane or n-

dimensional hyperplane of linear multiple regression is replaced by a smooth n-

dimensional curved surface characterized by peaks and valleys, ridges and troughs.

As an example, let us say there is a given number of input variables and a goal of

finding a nonlinear mapping that will provide an output from the network that best

fits the target. In the neural network, the goal is achieved via the “neurons,” the non-

linear elements that are connected to one another. The weights of the comrections are

adjusted to fit the surface to the data. The learning algorithm adjusts the weights to

get a particular curved surface that best fits the data points. As in a standard multi-

ple regression model, in which the coefficients of the regression are needed to define

the slope of the plane or hyperplane, a neural model requires that parameters, in the

form of connection weights, be determined so that the particular surface generated

(in this case a curved surface with hills and dales) will best fit the data.

NEURAL NETWORKS IN TRADING

Neural networks had their heyday in the late 1980 and early 1990s. Then the hon-

eymoon ended. What happened? Basically, disillusionment set in among traders

PART I, The study of Entries

who believed that this new technology could, with little or no effort on the trader™s

part, magically provide the needed edge. System developers would “train” their

nets on raw or mildly preprocessed data, hoping the neural networks themselves

would discover something useful. This approach was naive; nothing is ever so sim-

ple, especially when trading the markets. Not only was this “neural newbie”

approach an ineffective way to use neural networks, but so many people were

attempting to use nets that whatever edge was originally gained was nullified by

the response of the markets, which was to become more efficient with regard to

the technology. The technology itself was blamed and discarded with little con-

sideration to the thought that it was being inappropriately applied. A more sophis-

ticated, reasoned approach was needed if success was going to be achieved.

Most attempts to develop neural network forecasting models, whether in a

simplistic manner or more elaborately, have focused on individual markets. A seri-

ous problem with the use of individual markets, however, is the limited number of

data points available on which to train the net. This situation leads to grand oppor-

tunities for curve-fitting (the bad kind)--something that can contribute greatly to

the likelihood of failure with a neural network, especially with less than ideal data

preprocessing and targets. In this chapter, however, neural networks will be trained

on a whole portfolio of tradables, resulting in the availability of many tens of thou-

sands of data points (facts), and a reduction in curve-fitting for small to moderate-

sized networks. Perhaps, in this context, a fairly straightforward attempt to have a

neural network predict current or near-future market behavior might be success-

ful. In essence, such a network could be considered a universal market forecaster,

in that, trained across an entire portfolio of tradables, it might be able to predict

on all markets, in a non-market-specific fashion.

FORECASTING WITH NEURAL NETWORKS

Neural networks will be developed to predict (1) where the market is in terms of

its near-future range and (2) whether tomorrow™s open represents a turning point.

Consider, first, the goal of predicting where the market is relative to its near-future

range. An attempt will be made to build a network to predict a time-reversed

Stochastic, specifically the time-reversed Slow %K. This is the usual Stochastic,

except that it is computed with time running backward. The time-reversed Slow

%K reflects where the current close lies with respect to the price range over the

next several bars. If something could predict this, it would be useful to the trader:

Knowing that today™s close, and probably tomorrow™s open, lies near the bottom of

the range of the next several days™ prices would suggest a good buy point; and

knowing that today™s close, or tomorrow™s open, lies near the top of the range

would be useful in deciding to sell. Consider, second, the goal of predicting whether

tomorrow™s open is a top, a bottom, or neither. Two neural networks will be trained.

One will predict whether tomorrow™s open represents a bottom turning point, i.e.,

has a price that is lower than the prices on earlier and later bars. The other will pre-

dict whether tomorrow™s open represents a top turning point, i.e., has a price that is

higher than the prices on earlier or later bars. Being able to predict whether a bot-

tom or a top will occur at tomorrow™s open is also useful for the trader trying to

decide when to enter the market and whether to go long or short. The goal in this

study is to achieve such predictions in any market to which the model is applied.

GENERATING ENTRIES WITH NEURAL

PREDICTIONS

Three nets will be trained, yielding three entry models. no models will be con-

stmcted for turning points. One model will be designed to detect bottoms, the other

model to detect tops. For the bottom detection model, if the neural net indicates that

the probability that tomorrow™s open will be a bottom is greater than some thresh-

old, then a buy order will be posted. For the top detection model, if the neural net

indicates that the probability that tomorrow™s open will be a top is greater than

some other threshold, then a sell order will be posted. Neither model will post an

order under any other cicumstances. These rules amount to nothing mom than a

simple strategy of selling predicted tops and buying predicted bottoms. If, with bet-

ter than chance accuracy, the locations of bottoms and tops can be detected in time

to trade them, trading should be profitable. The detection system does not have to

be perfect, just sufficiently better than chance so as to overcome transaction costs.

For the model that predicts the time-reversed Slow %K, a similar strategy will

be used. If the prediction indicates that the time-reversed Slow %K is likely to be

less than some lower threshold, a buy will be posted; the market is near the bottom

of its near-future range and so a profit should quickly develop. Likewise, if the pre-

dicted reverse Slow %K is high, above an upper threshold, a sell will be posted.

These entries share the characteristics of other entries based on predictive, rather

than responsive, analysis. The entries lend themselves to countertrend trading and, if

the predictions are accurate, can dramatically limit transaction costs in the form of slip-

page, and provide good fills since the trader will be buying when others are selling and

vice versa. A good predictive model is the trader™s Holy Grail, providing the ability to

sell near tops and buy near bottoms. As with other predictive-based entries, if the pre-

dictions are not sufficiently accurate, the benefits will be outweighed by the costs of

bad trades when the predictions go wrong, as they often do.

TIME-REVERSED SLOW %K MODEL

The first step in developing a neural forecasting model is to prepare a trainingfact

set, which is the sample of data consisting of examples from which the net learns;

i.e., it is the data used to train the network and to estimate certain statistics. In this

case, the fact set is generated using the in-sample data from all commodities in the

portfolio. The number of facts in the fact set is, therefore, large-88,092 data

points. A fact set is only generated for training, not for testing, for reasons that will

be explained later.

To generate the facts that make up the fact set for this model, the initial step

of computing the time-reversed Slow %K, which is to serve as the target, must be

taken. Each fact is then created and written to a file by stepping through the in-

sample bars for each commodity in the portfolio. For each current bar (the one cur-

rently being stepped through), the process of creating a fact begins with computing

each input variable in the fact. This is done by calculating a difference between a

pair of prices, and then dividing that difference by the square-root of the number

of bars that separate the two prices. The square-root correction is used because, in

a random market, the standard deviation of a price difference between a pair of

bars is roughly proportional to the square-root of the number of bars separating the

two prices. The correction will force each price difference to contribute about

equally to the fact. In this experiment, each fact contains 18 price changes that are

computed using the square-root correction. These 18 prices change scores will

serve as the 18 inputs to the neural network after some additional processing.

The pairs of prices (used when computing the changes) are sampled with

increasing distance between them: i.e., the further back in time, the greater the dis-

tance between the pairs. For the first few bars prior to the current bar, the spacing

between the prices differenced is only 1 bar; i.e., the price of the bar prior to the

current bar is be subtracted from the price of the current bar; the price 2 bars

before the current bar is subtracted from the price 1 bar ago, etc. After several such

price change scores, the sampling is increased to every 2 bars, then every 4, then

every 8, etc. The exact spacings are in a table inside the code. The rationale behind

this procedure is to obtain more detailed information on very recent price behav-

ior. The further back the prices are in time, the more likely only longer-term move-

ments will be significant. Therefore, less resolution should be required. Sampling

the bars in this way ought to provide sufftcient resolution to detect cycles and other

phenomena that range from a period of 1 or 2 bars through 50 bars or more. This

approach is in accord with a suggestion made by Mark Jurik (jurikres.com).

After assembling the 18 input variables consisting of the square-root-cor-

rected price differences for a fact, a normalization procedure is applied. The inten-

tion is to preserve wave shape while discarding amplitude information, By treating

the 18 input variables as a vector, the normalization consists of scaling the vector

to unit length. The calculations involve squaring each vector element or price dif-

ference, summing the squares, taking the square-root, and then dividing each ele-

ment by the resultant number. These are the input variables for the neural network.

In actual fact, the neural network software will further scale these inputs to an

appropriate range for the input neurons.

The target (dependent variable in regression terms) for each fact is simply the

time-reversed Slow %K for the current bar. The input variables and target for each

fact are written in simple ASCII format to a file that can be analyzed with a good

neural network development package.

The resultant fact set is used to train a net to predict the time-reversed Slow

%K, i.e., to predict the relative position of today™s close, and, it is hoped, tomor-

row™s open, with respect to the range of prices over the next 10 bars (a IO-bar time-

reversed Slow %K).

The next step in developing the neural forecaster is to actually train some

neural networks using the just-computed fact set. A series of neural networks,

varying in size, are trained. The method used to select the most appropriately sized

and trained network is not the usual one of examining behavior on a test set con-

sisting of out-of-sample data. Instead, the correlation coefficients, which reflect

the predictive capabilities of each of the networks, are corrected for shrinkage

based on the sample size and the number of parameters or connection weights

being estimated in the corresponding network. The equation employed to correct

for shrinkage is the same one used to correct the multiple correlations derived

from a multivariate regression (see the chapters on optimization and statistics).

Shrinkage is greater for larger networks, and reflects curve-fitting of the undesir-

able kind. For a larger network to be selected over a smaller network, i.e., to over-

come its greater shrinkage, the correlation it produces must be sufficiently greater

than that produced by the smaller network. This technique enables networks to be

selected without the usual reference to out-of-sample data. All networks are fully

trained: i.e., no attempt is being made to compensate for loss of degrees of free-

dom by undertraining.

The best networks, selected on the basis of the shrinkage-corrected correla-

tions, are then tested using the actual entry model, together with the standard exit,

on both in- and out-of-sample data and across all markets. Because shrinkage results

from curve-fitting, excessively curve-fit networks should have very poor shrinkage-

corrected correlations. The large number of facts in the fact set (88,092) should help

reduce the extent of undesirable curve-fitting for moderately sized networks.

Code for the Reverse Slow %K Model

234

] ,, process next bar

The code is comprised of two functions: the usual function that implements the

trading model (Model), and a procedure to prepare the neural network inputs

(PrepareNeuralInputs). The procedure that prepares the inputs requires the index

of the current bar (cb) and a series of closing prices (cls) on which to operate.

The PrepareNeurallnputs function, given the index to the currant bar and a

series of closing prices, calculates all inputs for a given fact that are required for the

neural network. In the list, pbars, the numbers after the tirst zero (which is ignored),

are the lookbacks, relative to the current bar, which are used to calculate the price

differences discussed earlier. The first block of code, after the declarations, initial-

izes a price adjustment factor table. The table IS tmualized on the first pass through

the function and contains the square-roots of the number of bars between each pair

of prices from which a difference is computed. The next block of code calculates the

adjusted price differences, as well as the sum of the squares of these differences, i.e.,

238