<< . .

. 20
( : 30)



. . >>

means that the loss was between $0 and $1,999 or the profit was between $0 and
$1,000 per trade. (For information about the various markets and their symbols,
see Table 11-l in the “Introduction” to Part II.)
Only the IO-Year Notes and Cotton showed strong profits across all three
entry orders in-sample. Out-of-sample, performance on these markets was miser-
able. The S&P 500, a market that, in our experience, has many clear and tradable
cycles, demonstrated strong profitability on the in-sample data when entry was at
open or on limit. This market was strongly profitable out-of-sample with entry on
limit and on stop, but somewhat less profitable with entry at open. Interestingly,
the NYFE, although evidencing strong in-sample profits with entry at open and on
limit, had losses out-of-sample across all three orders. There are a few other prof-
CHAPTER 10 Cycle-Based Entries 223




TAELE 10-2

Performance Data Broken Down by Market and Test

, __. _. --ample 1
.. -...
r.-



” In2 03 1 count I01 102 103 1 count
I++ I+4 I- I 91 I I& I** I&+ I 7
. . I I-
91 -_ _ _ I n
++ ++ -
224




itable in-sample market-order combinations, as well as out-of-sample market-
order combinations, However, very little correspondence between the two was
observed. Perhaps markets that have not had cycles in the past (in-sample) have
cycles in the present (out-of-sample), and vice versa. At least the S&P 500
behaved as expected on the basis of prior research and may be one of the few mar-
kets consistently amenable to cycle trading in this crude form.
Figure 10-4 depicts the equity for the portfolio with entry at open. Equity
declined slowly and then became rather flat until about August 1992, at which
time it began a steady and rapid decline.


CONCLUSION
In our May 1997 study, the filter bank method appeared to have potential as the
basis for an effective trading strategy. At times it worked incredibly well, and was
almost completely insensitive to large variations in its parameters, whereas at
other times it performed poorly. The results may simply have been due to the fact
that the implementation was “quick and dirty.” Back then, the focus was on the
S&P 500, a market that continued to trade well in the present study.
The results of the current study are disappointing, all the more given the the-
oretical elegance of the filters. It may be that other approaches to the analysis of
cycles, e.g., the use of maximum entropy, might have provided better perfor-
mance; then again, maybe not. Other traders have also experienced similar disap-
pointments using a variety of techniques when trading cycles in a simple,
buy-the-bottom/sell-the-top manner. It may be that cycles are too obvious and
detectable by any of a number of methods, and may be traded away very quickly
whenever they develop in the market. This especially seems the case in recent
years with the proliferation of cycle analysis software. The suggestion is not that
cycles should be abandoned as a concept, but that a more sophisticated use of
detected cycles must be made. Perhaps better results would ensue if cycles were
combined with other kinds of entry criteria, e.g., taking trades only if a cycle top
corresponds to an expected seasonal turning-point top.
Further studies are needed to determine whether the cycle model does indeed
have the characteristic of giving precise entries when it works, but failing miser-
ably when it does not work. Looking over a chart of the S&P 500 suggests this is
the case. There are frequentiy strings of four or five trades in a row, with entries
that occur precisely at market tops and bottoms, as if predicted with perfect hind-
sight. At other times, entries occur exactly where they should not. With a system
that behaves this way, our experience indicates that, combined with a proper exit,
sometimes great profits can be achieved. More specifically, losses have to be cut
very quickly when the model fails, but trades should not be prematurely terminat-
ed when the model is correct in its predictions. Because of the precision of the
model when the predictions are correct, an extremely tight stop could perhaps
FIGURE 1 O-4

Portfolio Equity Growth for Count&rend Cycle Trading




In-samola eauilv
226




accomplish the goal. When an exact cycle top or bottom is caught, the market
begins to move immediately in the favored direction, with hardly any adverse
excursion, and the stop is never hit. When the model fails, the stop is hit very
quickly, resulting in only a small loss. Given the fairly loose stop of the standard
exit, the benefits of sophisticated cycle trading may not have been realized.

WHAT HAVE WE LEARNED?
Models that are theoretically sound, elegant, and appealing do not neces-
n

sarily work well when trading real markets.
n Exception to Rule 1: The S&P 500 may respond to such methods; it did

so both in our earlier study and in the current one.
. When the model does work, it does so remarkably well. As stated earlier,
when examining its behavior on the S&P 500 and several other markets,
one can quickly and easily find strings of signals that pick off tops and
bottoms with the precision of hindsight.
. The previous point suggests that exits specifically designed for a system
that yields high precision when correct, but fails badly when incorrect,
may he required.
. The markets appear to have become more efficient relative to cycle
models, as they have to breakout models. Obvious market behavior
(such as clear, tradable cycles) are traded away before most traders can
capitalize on them. The lesson: Anything too theoretically appealing or
obvious will tend not to work.
Neural Networks




Neural network technology, a form of artificial intelligence (or AI), arose from
endeavors to emulate the kind of information processing and decision making
that occurs in living organisms. The goal was to model the behavior of neural tis-
sue in living systems by using a computer to implement structures composed of
simulated neurons and neural interconnections (synapses). Research on neural
networks began in the 1940s on a theoretical level. When computer technology
became sophisticated enough to accommodate such research, the study of neural
networks and their applications began in earnest. It was not, however, until the
mid-to-late 1980s that neural network technology became of interest to the finan-
cial community. By 1989, a few vendors of neural network development tools
were available, and there was one commercial S&P 500 forecasting system based
on this technology (Scientific Consultant Services™ NexTurn). In the early 199Os,
interest peaked, more development tools appeared, but the fervor then waned for
reasons discussed later.
While it is not within the scope of this book to present a full tutorial on neur-
al network technology, below is a brief discussion to provide basic understanding.
Those interested in exploring this subject in greater depth should read our contr-
butions to the books Virtual Trading (Lederman and Klein, 1995) and
Computerized Trading (Jurik, 1999), in which we also present detailed informa-
tion on system development using neural networks, as well as our articles in
Technical Analysis of Stocks and Commodities (Katz, April 1992; Katz and
McCormick, November 1996, November 1997). Neural Networks in Finance and
Znvesting (Trippi and Turban, 1993) should also be of interest.
WHAT ARE NEURAL NETWORKS?
Neural nerworks (or “nets”) are basically building blocks that learn and are useful
for pattern recognition, classification, and prediction. They hold special appeal to
traders because nets are capable of coping both with probability estimates in
uncertain situations and with “fuzzy” patterns, i.e., those recognizable by eye but
difficult to define in software using precise rules; and they have the potential to
recognize almost any pattern that exists. Nets can also integrate large amounts of
information without becoming stifled by detail and can be made to adapt to chang-
ing markets and market conditions.
A variety of neural networks are available, differing in terms of their “archi-
tecture,” i.e., the ways in which the simulated neurons are interconnected, the
details of how these neurons behave (signal processing behavior or “transfer func-
tions”), and the process through which learning takes place. There are a number of
popular kinds of neural networks that are of some use to traders: the Kohonen and
the Learning Vector Quantization (LVQ) networks, various adaptive resonance net-
works, and recurrent networks. In this chapter, the most popular and, in many
respects, the most useful kind of network is discussed: the “feed-forward” network.
As mentioned above, nets differ in the ways they learn. The system develop-
er plays the role of the neural network™s teacher, providing the net with examples
to learn from. Some nets employ “supervised learning” and others “unsupervised
learning.” Supervised learning occurs when the network is taught to produce a cor-
rect solution by being shown instances of correct solutions. This is a form of
paired-associate learning: The network is presented with pairs of inputs and a
desired output; for every set of inputs, it is the task of the net to learn to produce
the desired output. Unsupervised learning, on the other hand, involves nets that
take the sets of inputs they are given and organize them as they see tit, according
to patterns they lind therein. Regardless of the form of learning employed, the
main difficulty in developing successful neural network models is in finding and
“massaging” historical data into training examples or “facts” that highlight rele-
vant patterns so that the nets can learn efficiently and not be put astray or con-
fused; “preprocessing” the data is an art in itself.
The actual process of learning usually involves some mechanism for updat-
ing the neural connection weights in response to the training examples. With feed-
forward architectures, back-propagation, a form of steepest-descent optimization,
is often used. Genetic algorithms are also effective. These are very computation-
ally intensive and time-consuming, but generally produce better final results.

Feed-Forward Neural Networks
A feed-forward network consists of layers of neurons. The input layer, the tirst
layer, receives data or inputs from the outside world. The inputs consist of inde-
pendent variables (e.g., market or indicator variables upon which the system is to
be based) from which some inference is to be drawn or a prediction is to be made.
The input layer is massively connected to tire next layer, which is often called the
hidden layer because it has no connections to tire outside world. The outputs of the
hidden layer are fed to the next layer, which may be another hidden layer (if it is,
the process repeats), or it may be the output layer. Each neuron in the output layer
produces an output composed of the predictions, classifications, or decisions made
by the network. Networks are usually identified by the number of neurons in each
layer: For example, a 10-3-l network is one that has 10 neurons in its first or input
layer, 3 neurons in its middle layer, and 1 neuron in its output layer. Networks vary
in size, from only a few neurons to thousands, from only three layers to dozens;
the size depends on the complexity of the problem. Almost always, a three- or
four-layer network suffices.
Feed-forward networks (the kind being used in this chapter) implement a par-
ticular form of nonlinear multiple regression. The net takes a number of input vari-
ables and uses them to predict a target, exactly as in regression. In a standard linear
multiple regression, if the goal is to predict cholesterol (the dependent variable or
target) on the basis of dietary fat intake and exercise (the independent variables or
inputs), the data would be modeled as follows: predicted cholesterol = a + b * fat
intake + c * exercise: where a, b, and c represent parameters that would be deter-
mined by a statistical procedure. In a least-squares sense, a line, plane, or hyper-
plane (depending on the number of independent variables) is being fitted to the
points in a data space. In the example above, a plane is being fit: The x-axis repre-
sents fat intake, tire y-axis is exercise, and the height of the plane at each xy coor-
dinate pair represents predicted cholesterol.
When using neural network technology, the two-dimensional plane or n-
dimensional hyperplane of linear multiple regression is replaced by a smooth n-
dimensional curved surface characterized by peaks and valleys, ridges and troughs.
As an example, let us say there is a given number of input variables and a goal of
finding a nonlinear mapping that will provide an output from the network that best
fits the target. In the neural network, the goal is achieved via the “neurons,” the non-
linear elements that are connected to one another. The weights of the comrections are
adjusted to fit the surface to the data. The learning algorithm adjusts the weights to
get a particular curved surface that best fits the data points. As in a standard multi-
ple regression model, in which the coefficients of the regression are needed to define
the slope of the plane or hyperplane, a neural model requires that parameters, in the
form of connection weights, be determined so that the particular surface generated
(in this case a curved surface with hills and dales) will best fit the data.

NEURAL NETWORKS IN TRADING
Neural networks had their heyday in the late 1980 and early 1990s. Then the hon-
eymoon ended. What happened? Basically, disillusionment set in among traders
PART I, The study of Entries


who believed that this new technology could, with little or no effort on the trader™s
part, magically provide the needed edge. System developers would “train” their
nets on raw or mildly preprocessed data, hoping the neural networks themselves
would discover something useful. This approach was naive; nothing is ever so sim-
ple, especially when trading the markets. Not only was this “neural newbie”
approach an ineffective way to use neural networks, but so many people were
attempting to use nets that whatever edge was originally gained was nullified by
the response of the markets, which was to become more efficient with regard to
the technology. The technology itself was blamed and discarded with little con-
sideration to the thought that it was being inappropriately applied. A more sophis-
ticated, reasoned approach was needed if success was going to be achieved.
Most attempts to develop neural network forecasting models, whether in a
simplistic manner or more elaborately, have focused on individual markets. A seri-
ous problem with the use of individual markets, however, is the limited number of
data points available on which to train the net. This situation leads to grand oppor-
tunities for curve-fitting (the bad kind)--something that can contribute greatly to
the likelihood of failure with a neural network, especially with less than ideal data
preprocessing and targets. In this chapter, however, neural networks will be trained
on a whole portfolio of tradables, resulting in the availability of many tens of thou-
sands of data points (facts), and a reduction in curve-fitting for small to moderate-
sized networks. Perhaps, in this context, a fairly straightforward attempt to have a
neural network predict current or near-future market behavior might be success-
ful. In essence, such a network could be considered a universal market forecaster,
in that, trained across an entire portfolio of tradables, it might be able to predict
on all markets, in a non-market-specific fashion.

FORECASTING WITH NEURAL NETWORKS
Neural networks will be developed to predict (1) where the market is in terms of
its near-future range and (2) whether tomorrow™s open represents a turning point.
Consider, first, the goal of predicting where the market is relative to its near-future
range. An attempt will be made to build a network to predict a time-reversed
Stochastic, specifically the time-reversed Slow %K. This is the usual Stochastic,
except that it is computed with time running backward. The time-reversed Slow
%K reflects where the current close lies with respect to the price range over the
next several bars. If something could predict this, it would be useful to the trader:
Knowing that today™s close, and probably tomorrow™s open, lies near the bottom of
the range of the next several days™ prices would suggest a good buy point; and
knowing that today™s close, or tomorrow™s open, lies near the top of the range
would be useful in deciding to sell. Consider, second, the goal of predicting whether
tomorrow™s open is a top, a bottom, or neither. Two neural networks will be trained.
One will predict whether tomorrow™s open represents a bottom turning point, i.e.,
has a price that is lower than the prices on earlier and later bars. The other will pre-
dict whether tomorrow™s open represents a top turning point, i.e., has a price that is
higher than the prices on earlier or later bars. Being able to predict whether a bot-
tom or a top will occur at tomorrow™s open is also useful for the trader trying to
decide when to enter the market and whether to go long or short. The goal in this
study is to achieve such predictions in any market to which the model is applied.


GENERATING ENTRIES WITH NEURAL
PREDICTIONS
Three nets will be trained, yielding three entry models. no models will be con-
stmcted for turning points. One model will be designed to detect bottoms, the other
model to detect tops. For the bottom detection model, if the neural net indicates that
the probability that tomorrow™s open will be a bottom is greater than some thresh-
old, then a buy order will be posted. For the top detection model, if the neural net
indicates that the probability that tomorrow™s open will be a top is greater than
some other threshold, then a sell order will be posted. Neither model will post an
order under any other cicumstances. These rules amount to nothing mom than a
simple strategy of selling predicted tops and buying predicted bottoms. If, with bet-
ter than chance accuracy, the locations of bottoms and tops can be detected in time
to trade them, trading should be profitable. The detection system does not have to
be perfect, just sufficiently better than chance so as to overcome transaction costs.
For the model that predicts the time-reversed Slow %K, a similar strategy will
be used. If the prediction indicates that the time-reversed Slow %K is likely to be
less than some lower threshold, a buy will be posted; the market is near the bottom
of its near-future range and so a profit should quickly develop. Likewise, if the pre-
dicted reverse Slow %K is high, above an upper threshold, a sell will be posted.
These entries share the characteristics of other entries based on predictive, rather
than responsive, analysis. The entries lend themselves to countertrend trading and, if
the predictions are accurate, can dramatically limit transaction costs in the form of slip-
page, and provide good fills since the trader will be buying when others are selling and
vice versa. A good predictive model is the trader™s Holy Grail, providing the ability to
sell near tops and buy near bottoms. As with other predictive-based entries, if the pre-
dictions are not sufficiently accurate, the benefits will be outweighed by the costs of
bad trades when the predictions go wrong, as they often do.


TIME-REVERSED SLOW %K MODEL
The first step in developing a neural forecasting model is to prepare a trainingfact
set, which is the sample of data consisting of examples from which the net learns;
i.e., it is the data used to train the network and to estimate certain statistics. In this
case, the fact set is generated using the in-sample data from all commodities in the
portfolio. The number of facts in the fact set is, therefore, large-88,092 data
points. A fact set is only generated for training, not for testing, for reasons that will
be explained later.
To generate the facts that make up the fact set for this model, the initial step
of computing the time-reversed Slow %K, which is to serve as the target, must be
taken. Each fact is then created and written to a file by stepping through the in-
sample bars for each commodity in the portfolio. For each current bar (the one cur-
rently being stepped through), the process of creating a fact begins with computing
each input variable in the fact. This is done by calculating a difference between a
pair of prices, and then dividing that difference by the square-root of the number
of bars that separate the two prices. The square-root correction is used because, in
a random market, the standard deviation of a price difference between a pair of
bars is roughly proportional to the square-root of the number of bars separating the
two prices. The correction will force each price difference to contribute about
equally to the fact. In this experiment, each fact contains 18 price changes that are
computed using the square-root correction. These 18 prices change scores will
serve as the 18 inputs to the neural network after some additional processing.
The pairs of prices (used when computing the changes) are sampled with
increasing distance between them: i.e., the further back in time, the greater the dis-
tance between the pairs. For the first few bars prior to the current bar, the spacing
between the prices differenced is only 1 bar; i.e., the price of the bar prior to the
current bar is be subtracted from the price of the current bar; the price 2 bars
before the current bar is subtracted from the price 1 bar ago, etc. After several such
price change scores, the sampling is increased to every 2 bars, then every 4, then
every 8, etc. The exact spacings are in a table inside the code. The rationale behind
this procedure is to obtain more detailed information on very recent price behav-
ior. The further back the prices are in time, the more likely only longer-term move-
ments will be significant. Therefore, less resolution should be required. Sampling
the bars in this way ought to provide sufftcient resolution to detect cycles and other
phenomena that range from a period of 1 or 2 bars through 50 bars or more. This
approach is in accord with a suggestion made by Mark Jurik (jurikres.com).
After assembling the 18 input variables consisting of the square-root-cor-
rected price differences for a fact, a normalization procedure is applied. The inten-
tion is to preserve wave shape while discarding amplitude information, By treating
the 18 input variables as a vector, the normalization consists of scaling the vector
to unit length. The calculations involve squaring each vector element or price dif-
ference, summing the squares, taking the square-root, and then dividing each ele-
ment by the resultant number. These are the input variables for the neural network.
In actual fact, the neural network software will further scale these inputs to an
appropriate range for the input neurons.
The target (dependent variable in regression terms) for each fact is simply the
time-reversed Slow %K for the current bar. The input variables and target for each
fact are written in simple ASCII format to a file that can be analyzed with a good
neural network development package.
The resultant fact set is used to train a net to predict the time-reversed Slow
%K, i.e., to predict the relative position of today™s close, and, it is hoped, tomor-
row™s open, with respect to the range of prices over the next 10 bars (a IO-bar time-
reversed Slow %K).
The next step in developing the neural forecaster is to actually train some
neural networks using the just-computed fact set. A series of neural networks,
varying in size, are trained. The method used to select the most appropriately sized
and trained network is not the usual one of examining behavior on a test set con-
sisting of out-of-sample data. Instead, the correlation coefficients, which reflect
the predictive capabilities of each of the networks, are corrected for shrinkage
based on the sample size and the number of parameters or connection weights
being estimated in the corresponding network. The equation employed to correct
for shrinkage is the same one used to correct the multiple correlations derived
from a multivariate regression (see the chapters on optimization and statistics).
Shrinkage is greater for larger networks, and reflects curve-fitting of the undesir-
able kind. For a larger network to be selected over a smaller network, i.e., to over-
come its greater shrinkage, the correlation it produces must be sufficiently greater
than that produced by the smaller network. This technique enables networks to be
selected without the usual reference to out-of-sample data. All networks are fully
trained: i.e., no attempt is being made to compensate for loss of degrees of free-
dom by undertraining.
The best networks, selected on the basis of the shrinkage-corrected correla-
tions, are then tested using the actual entry model, together with the standard exit,
on both in- and out-of-sample data and across all markets. Because shrinkage results
from curve-fitting, excessively curve-fit networks should have very poor shrinkage-
corrected correlations. The large number of facts in the fact set (88,092) should help
reduce the extent of undesirable curve-fitting for moderately sized networks.


Code for the Reverse Slow %K Model
234
] ,, process next bar




The code is comprised of two functions: the usual function that implements the
trading model (Model), and a procedure to prepare the neural network inputs
(PrepareNeuralInputs). The procedure that prepares the inputs requires the index
of the current bar (cb) and a series of closing prices (cls) on which to operate.
The PrepareNeurallnputs function, given the index to the currant bar and a
series of closing prices, calculates all inputs for a given fact that are required for the
neural network. In the list, pbars, the numbers after the tirst zero (which is ignored),
are the lookbacks, relative to the current bar, which are used to calculate the price
differences discussed earlier. The first block of code, after the declarations, initial-
izes a price adjustment factor table. The table IS tmualized on the first pass through
the function and contains the square-roots of the number of bars between each pair
of prices from which a difference is computed. The next block of code calculates the
adjusted price differences, as well as the sum of the squares of these differences, i.e.,
238



<< . .

. 20
( : 30)



. . >>