<< . .

. 3
( : 30)

. . >>

sometimes referred to by statisticians. Missing, extra, and logically inconsistent
data points are also occasionally seen; they should be noted and corrected. As an
example of data checking, two data sets were run through a utility program that
scans for missing data points, outliers, and logical inconsistencies. The results
appear in Tables I-1 and 1-2, respectively.
Table I 1 shows the output produced by the data-checking program when it was
used on Pinnacle Data Corporation™s (800-724-4903) end-of-day, continuous-con-
tract data for the S&P 500 futures. The utility found no illogical prices or volumes in
this data set; there were no observed instances of a high that wan less than the close,
a low that was greater than the open, a volume that was less than zero, or of any cog-
nate data faux pas. Rvo data points (bars) with suspiciously high ranges, however,
were noted by the software: One bar with unusual range occurred on 1 O/l 9/87 (or
871019 in the report). The other was dated 10/13/89. The abnormal range observed
on 10/19/87 does not reflect an error, just tbe normal volatility associated with a major
crash like that of Black Monday; nor is a data error responsible for the aberrant range
seen on 10/13/89, which appeared due to the so-called anniversary effect. Since these
statistically aberrant data points were not errors, corrections were unnecessary.

Nonetheless, the presence of such data points should emphasize the fact that market
events involving exceptional ranges do occur and must be managed adequately by a
trading system. All ranges shown in Table l-l are standardized ranges, computed by
dividing a bar™s range by the average range over the last 20 bars. As is common with
market data, the distribution of the standardized range had a longer tail than would be


Output from Data-Checking Utility for End-of-Day S&P 500
Continuous-Contract Futures Data from Pinnacle
CHAPTER 1 oata 9

expected given a normally distributed underlying process. Nevertheless, the events of
10/19/87 and 10/13/89 appear to be statistically exceptional: The distribution of all
other range data declined, in an orderly fashion, to zero at a standardized value of 7,
well below the range of 10 seen for the critical bars.
The data-checking utility also flagged 5 bars as having exceptionally deviant
closing prices. As with range, deviance has been defined in terms of a distribution,
using a standardized close-to-close price measure. In this instance, the standard-
ized measure was computed by dividing the absolute value of the difference
between each closing price and its predecessor by the average of the preceding 20
such absolute values. When the 5 flagged (and most deviant) bars were omitted,
the same distributional behavior that characterized the range was observed: a long-
tailed distribution of close-to-close price change that fell off, in an orderly fasb-
ion, to zero at 7 standardized units. Standardized close-to-close deviance scores
(DEV) of 8 were noted for 3 of the aberrant bars, and scores of 10 were observed
for the remaining 2 bars. Examination of the flagged data points again suggests
that unusual market activity, rather than data error, was responsible for their sta-
tistical deviance. It is not surprising that the 2 most deviant data points were the
same ones noted earlier for their abnormally high range. Finally, the data-check-
ing software did not find any missing bars, bars falling on weekends, or bars with
duplicate or out-of-order dates. The only outliers detected appear to be the result
of bizarre market conditions, not cormpted data. Overall, the S&P 500 data series
appears to be squeaky-clean. This was expected: In our experience, Pinnacle Data
Corporation (the source of the data) supplies data of very high quality.
As an example of how bad data quality can get, and the kinds of errors that
can be expected when dealing with low-quality data, another data set was ana-
lyzed with the same data-checking utility. This data, obtained from an acquain-
tance, was for Apple Computer (AAPL). The data-checking results appear in
Table l-2.
In this data set, unlike in the previous one, 2 bars were flagged for having
outright logical inconsistencies. One logically invalid data point had an opening
price of zero, which was also lower than the low, while the other bar had a high
price that was lower than the closing price. Another data point was detected as
having an excessive range, which may or may not be a data error, In addition, sev-
eral bars evidenced extreme closing price deviance, perhaps reflecting uncorrect-
ed stock splits. There were no duplicate or out-of-order dates, but quite a few data
points were missing. In this instance, the missing data points were holidays and,
therefore, only reflect differences in data handling: for a variety of reasons, we
usually fill holidays with data from previous bars. Considering that the data series
extended only from l/2/97 through 1 l/6/98 (in contrast to the S&P 500, which ran
from l/3/83 to 5/21/98), it is distressing that several serious errors, including log-
ical violations, were detected by a rather simple scan.
The implication of this exercise is that data should be purchased only from a

Output from Data-Checking Utility for Apple Computer, Symbol AAPL
reputable vendor who takes data quality seriously; this will save time and ensure
reliable, error-free data for system development, testing, and trading, In addition,
all data should be scanned for errors to avoid disturbing surprises. For an in-depth
discussion of data quality, which includes coverage of how data is produced, trans-
mitted, received, and stored, see Jurik (1999).

Today there are a great many sowces from which data may be acquired. Data may
be purchased from value-added vendors, downloaded from any of several
exchanges, and extracted from a wide variety of databases accessible over the
Internet and on compact discs.
Value-added vendors, such as Tick Data and Pinnacle, whose data have been
used extensively in this work, can supply the trader with relatively clean data in
easy-to-use form. They also provide convenient update services and, at least in the
case of Pinnacle, error corrections that are handled automatically by the down-
loading software, which makes the task of maintaining a reliable, up-to-date data-
base very straightforward. Popular suppliers of end-of-day commodities data
include Pinnacle Data Corporation (800-724-4903), Prophet Financial Systems
(650-322-4183). Commodities Systems Incorporated (CSI, 800.274.4727), and
Technical Tools (800-231-8005). Intraday historical data, which are needed for
testing short time frame systems, may be purchased from Tick Data (SOO-822-
8425) and Genesis Financial Data Services (800-62 l-2628). Day traders should
also look into Data Transmission Network (DTN, SOO-485-4000), Data
Broadcasting Corporation (DBC, 800.367.4670), Bonneville Market Information
(BMI, 800-532-3400), and FutureSource-Bridge (X00-621 -2628); these data dis-
tributors can provide the fast, real-time data feeds necessary for successful day
trading. For additional information on data sources, consult Marder (1999). For a
comparative review of end-of-day data, see Knight (1999).
Data need not always be acquired from a commercial vendor. Sometimes it
can be obtained directly from the originator. For instance, various exchanges occa-
sionally furnish data directly to the public. Options data can currently be down-
loaded over the Internet from the Chicago Board of Trade (CBOT). When a new
contract is introduced and the exchange wants to encourage traders, it will often
release a kit containing data and other information of interest. Sometimes this is
the only way to acquire certain kinds of data cheaply and easily.
FiiaIly, a vast, mind-boggling array of databases may be accessed using an
Internet web browser or ftp client. These days almost everything is on-line. For exam-
ple, the Federal Reserve maintains files containing all kinds of economic time series
and business cycle indicators. NASA is a great source for solar and astronomical data.
Climate and geophysical data may be downloaded from the National Climatic Data
Center (NCDC) and the National Geophysical Data Center (NGDC), respectively.
For the ardent net-surfer, there is an overwh&lming abundance of data in a staggering
variety of formats. Therein, however, lies another problem: A certain level of skill is
required in the art of the search, as is perhaps some basic programming or scripting
experience, as well as the time and effort to find, tidy up, and reformat the data. Since
“time is money,” it is generally best to rely on a reputable, value-added data vendor
for basic pricing data, and to employ the Internet and other sources for data that is
more specialized or difficult to acquire.
Additional sources of data also include databases available through libraries
and on compact discs. ProQuest and other periodical databases offer full text
retrieval capabilities and can frequently be found at the public library. Bring a
floppy disk along and copy any data of interest. Finally, do not forget newspapers
such as Investor™s Business Daily, Barron™s, and the Wall Street Journal; these can
be excellent sources for certain kinds of information and are available on micro-
film from many libraries.
In general, it is best to maintain data in a standard text-based (ASCII) for-
mat. Such a format has the virtue of being simple, portable across most operating
systems and hardware platforms, and easily read by all types of software, from text
editors to charting packages.


No savvy trader would trade a system with a real account and risk real money
without first observing its behavior on paper. A trading simulator is a software
application or component that allows the user to simulate, using historical data, a
trading account that is traded with a user-specified set of trading rules. The user™s
trading rules are written into a small program that automates a rigorous “paper-
trading” process on a substantial amount of historical data. In this way, the trad-
ing simulator allows the trader to gain insight into how the system might perform
when traded in a real account. The r&on d™&re of a trading simulator is that it
makes it possible to efficiently back-test, or paper-trade, a system to determine
whether the system works and, if so, how well.

There are two major forms of trading simulators. One form is the integrated, easy-
to-use software application that provides some basic historical analysis and simu-
lation along with data collection and charting. The other form is the specialized
software component or class library that can be incorporated into user-written
software to provide system testing and evaluation functionality. Software compo-
nents and class libraries offer open architecture, advanced features, and high lev-
els of performance, but require programming expertise and such additional
elements as graphics, report generation, and data management to be useful.
Integrated applications packages, although generally offering less powerful simu-
lation and testing capabilities, are much more accessible to the novice.
Regardless of whether an integrated or component-based simulator is employed,
the trading logic of the user™s system must be programmed into it using some com-
puter language. The language used may be either a generic programming lan-
guage, such as C+ + or FORTRAN, or a proprietary scripting language. Without
the aid of a formal language, it would be impossible to express a system™s trading
rules with the precision required for an accurate simulation. The need for pro-
gramming of some kind should not be looked upon as a necessary evil.
Programming can actually benefit the trader by encouraging au explicit and disci-
plined expression of trading ideas.
For an example of how trading logic is programmed into a simulator, consid-
er TradeStation, a popular integrated product from Omega Research that contains
an interpreter for a basic system writing language (called Easy Language) with bis-
torical simulation capabilities. Omega™s Easy Language is a proprietary, trading-
specific language based on Pascal (a generic programming language). What does a
simple trading system look like when programmed in Easy Language? The follow-
ing code implements a simple moving-average crossover system:

( Simple moving average crossover system in Easy Language)
(length parameter )
Inputs: k”(4) ;
rf (close > Average˜close, Led 1 And
(Close [II c= Average ˜ClOm?, Len) [II ) Then
myc ˜A”, 1 contract At Market; (buys at open of next bar)
If (Close <= Average (Close, Len), And
(Close L1, > *"verage (Close.
ILen) [II, Then
(sells at open Of next bar
Sellc"B"J 1 ContraCt At Market;

This system goes long one contract at tomorrow™s open when the close crosses
above its moving average, and goes short one contract when the close crosses
below the moving average. Each order is given a name or identifier: A for the buy:
B for the sell. The length of the moving average (Len) may be set by the user or
optimized by the software.
Below is the same system programmed in Cf + using Scientific Consultant
Services™ component-based C-Trader toolkit, which includes the C+ + Trading
Except for syntax and naming conventions, the differences between the Cf + and
Easy Language implementations are small. Most significant are the explicit refer-
ences to the current bar (cb) and to a particular simulated trading account or sim-
ulator class instance (ts) in the C+ + implementation. In C+ +, it is possible to
explicitly declare and reference any number of simulated accounts: this becomes
important when working with portfolios and merasystems (systems that trade the
accounts of other systems), and when developing models that incorporate an
implicit walk-forward adaptation.

All good trading simulators generate output containing a wealth of information
about the performance of the user™s simulated account. Expect to obtain data on
gross and net profit, number of winning and losing trades, worst-case draw-
down, and related system characteristics, from even the most basic simulators.
Better simulators provide figures for maximum run-up, average favorable and
adverse excursion, inferential statistics, and more, not to mention highly detailed
analyses of individual trades. An extraordinary simulator might also include in
its output some measure of risk relative to reward, such as the annualized risk-
to-reward ratio (ARRR) or the Sharp Rario, an important and well-known mea-
sure used to compare the performances of different portfolios, systems, or funds
(Sharpe, 1994).
The output from a trading simulator is typically presented to the user in the
form of one or more reports. Two basic kinds of reports are available from most trad-
ing simulators: the performance summary and the trade-by-trade, or “detail,” report.
The information contained in these reports can help the trader evaluate a system™s
“trading style” and determine whether the system is worthy of real-money trading.
Other kinds of reports may also be generated, and the information from the
simulator may be formatted in a way that can easily be run into a spreadsheet for
further analysis. Almost all the tables and charts that appear in this book were pro-
duced in this manner: The output from the simulator was written to a file that
would be read by Excel, where the information was further processed and format-
ted for presentation.

Performance Summary Reports
As an illustration of the appearance of performance summary reports, two have
been prepared using the same moving-average crossover system employed to
illustrate simulator programming. Both the TradeStation (Table 2-l) and C-Trader
(Table 2-2) implementations of this system were run using their respective target
software applications. In each instance, the length parameter (controls the period
of the moving average) was set to 4.
Such style factors as the total number of trades, the number of winning trades, the
number of losing trades, the percentage of profitable trades, the maximum numbers
of consecutive winners and losers, and the average numbers of bars in winners and
losers also appear in performance summary reports. Reward, risk, and style are crit-
ical aspects of system performance that these reports address.
Although all address the issues of reward, risk and trading style, there are a
number of differences between various performance summary reports. Least sig-
nificant are differences in formatting. Some reports, in an effort to cram as much
information as possible into a limited amount of space, round dollar values to the
nearest whole integer, scale up certain values by some factor of 10 to avoid the
need for decimals, and arrange their output in a tabular, spreadsheet-like format.
Other reports use less cryptic descriptors, do not round dollar values or rescale
numbers, and format their output to resemble more traditional reports,
Somewhat more significant than differences in formatting are the variations
between performance summary reports that result from the definitions and
assumptions made in various calculations. For instance, the number of winning
trades may differ slightly between reports because of how winners are defined.
Some simulators count as a winner any trade in which the P/L (proWloss) figure
is greater than or equal to zero, whereas others count as winners only trades for
which the P/L is strictly greater than zero. This difference in calculation also
affects figures for the average winning trade and for the ratio of the average win-
ner to the average loser. Likewise, the average number of bars in a trade may be
greater or fewer, depending on how they are counted. Some simulators include the
entry bar in all bar counts; others do not. Return-on-account figures may also dif-
fer, depending, for instance, on whether or not they are annualized.
Differences in content between performance summary reports may even be
more significant. Some only break down their performance analyses into long
positions, short positions, and all trades combined. Others break them down into
in-sample and out-of-sample trades, as well. The additional breakdown makes it
easy to see whether a system optimized on one sample of data (the in-sample set)
shows similar behavior on another sample (the out-of-sample data) used for veri-
fication; out-of-sample tests are imperative for optimized systems. Other impor-
tant information, such as the total bar counts, maximum run-up (the converse of
drawdown), adverse and favorable excursion numbers, peak equity, lowest equity,
annualized return in dollars, trade variability (expressed as a standard deviation),
and the annualized risk-to-reward ratio (a variant of the Sharpe Ratio), are present
in some reports. The calculation of inferential statistics, such as the t-statistic and
its associated probability, either for a single test or corrected for multiple tests or
optimizations, is also a desirable feature. Statistical items, such as t-tests and prob-
abilities, are important since they help reveal whether a system™s performance
reflects the capture of a valid market inefficiency or is merely due to chance or
excessive curve-fitting. Many additional, possibly useful statistics can also be cnl-
culated, some of them on the basis of the information present in performance sum-
maries. Among these statistics (Stendahl, 1999) are net positive outliers, net neg-
ative outliers, select net profit (calculated after the removal of outlier trades), loss
ratio (greatest loss divided by net profit), run-up-t&rawdown ratio, longest flat
period, and buy-and-hold return (useful as a baseline). Finally, some reports also
contain a text-based plot of account equity as a function of time.
To the degree that history repeats itself, a clear image of the past seems like
an excellent foundation from which to envision a likely future. A good perfor-
mance summary provides a panoramic view of a trading method™s historical
behavior. Figures on return and risk show how well the system traded on test data
from the historical period under study. The Sharpe Ratio, or annualized risk to
reward, measures return on a risk- or stability-adjusted scale. T-tests and related
statistics may be used to determine whether a system™s performance derives from
some real market inefficiency or is an artifact of chance, multiple tests, or inap-
propriate optimization. Performance due to real market inefficiency may persist
for a time, while that due to artifact is unlikely to recur in the future. In short, a
good performance summary aids in capturing profitable market phenomena likely
to persist; the capture of persistent market inefficiency is, of course, the basis for
any sustained success as a trader.
This wraps up the discussion of one kind of report obtainable within most
trading simulation environments. Next we consider the other type of output that
most simulators provide: the trade-by-trade report.

Trade-by-Trade Reports
Illustrative trade-by-trade reports were prepared using the simulators contained in
TradeStation (Table 2-3) and in the C-Trader toolkit (Table 2-4). Both reports per-
tain to the same simple moving-average crossover system used in various ways
throughout this discussion. Since hundreds of trades were taken by this system, the
original reports are quite lengthy. Consequently, large blocks of trades have been
edited out and ellipses inserted where the deletions were made. Because these
reports are presented merely for illustration, such deletions were considered
In contrast to a performance report, which provides an overall evaluation of
a trading system™s behavior, a detail or trade-by-trade report contains detailed
information on each trade taken in the simulated account. A minimal detail report
contains each trade™s entry and exit dates (and times, if the simulation involves
intraday data), the prices at which these entries and exits occurred, the positions
held (in numbers of contracts, long or short), and the profit or loss resulting from
each trade. A more comprehensive trade-by-trade report might also provide infor-
mation on the type of order responsible for each entry or exit (e.g., stop, limit, or
market), where in the bar the order was executed (at the open, the close, or in
Trade-by-Trade Report Generated by TradeStation for the Moving-
Average Crossover System

between), the number of bars each trade was held, the account equity at the start
of each trade, the maximum favorable and adverse excursions within each trade,
and the account equity on exit from each trade.
Most trade-by-trade reports contain the date (and time, if applicable) each
trade was entered, whether a buy or sell was involved (that is, a long or short posi-
tion established), the number of contracts in the transaction, the date the trade
was exited, the profit or loss on the trade, and the cumulative profit or loss on all
trades up to and including the trade under consideration. Reports also provide the
name of the order on which the trade was entered and the name of the exit order.
A better trade-by-trade report might include the fields for maximum favorable
excursion (the greatest unrealized profit to occur during each trade), the maxi-
mum adverse excursion (the largest unrealized loss), and the number of bars each
trade was held.
As with the performance summaries, there are differences between various
trade-by-trade reports with respect to the ways they are formatted and in the
assumptions underlying the computations on which they are based.
While the performance summary provides a picture of the whole forest, a good
trade-by-trade report focuses on the trees. In a good trade-by-trade report, each trade
is scrutinized in detail: What was the worst paper loss sustained in this trade? What
would the profit have been with a perfect exit? What was the actual profit (or loss)

Trade-by-Trade Report Generated Using the C-Trader Toolkit for the Moving-Average Crossover System

2 0 500
910527 -1 492.150 0 M 8: 910528 M A:
4550 6900 4550
910528 1 492.150 0 M A: 910607 M B: 11
-1 501.250 7 2975 4025 1525
0 B: 910613 0
910607 M M A:
0 5975
2 -1550
1 495.300 0 M A: 910614 492.200 0
910613 M Et:
0 M A: 5 -2350 525 2825 3625
910614 -1 492.200 0 M 8: 910618 496.900
0 H B: 3 -2550 400 2550 1075
910618 1 496.900 0 M A: 910620 491.800
-1 491.800 490.650 H A: 6 575 1650 500
910620 0 M 9: 910625 1650
951225 1 692.600 0 M B: 8 550 1725 325 -15625
691.500 0 M A: 960101
7ca.700 0 M A: 4 -4050 1200 4050
960101 -1 692.600 0 M B: 960104 -19675
1 700.700 0 M B: 5 -4550 1675 5100 -24225
960104 0 M A: 960108 691.600
697.600 0 M A: 3 -3000 1450 3000 -27225
960108 -1 691.600 0 M B: 960110
0 0 moo -35525
M B: 2 -8300
0 M A: 960111 661.000
960110 1 697.600
683.000 0 M A: 8 -1000 4300 2325 -36525
960111 -1 6al.cQo 0 M B: 960118
1 663.000 729.300 0 M B: 30 23150 29050 1450 -I3375
96olltJ 0 M A: 960216
727.500 0 M A: 8 900 8400 1875 -l2475
960216 -1 729.300 0 M B:
960223 1 727.500 724.750 0 t-i B: 6-1375 5725 2750 -l3850

<< . .

. 3
( : 30)

. . >>